forked from ROCm/pytorch
-
Notifications
You must be signed in to change notification settings - Fork 0
[AUTOGENERATED] rocm7.1_internal_testing_IFU_2025-08-22 #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
pragupta
wants to merge
722
commits into
rocm7.1_internal_testing
from
rocm7.1_internal_testing_IFU_2025-08-22
Closed
[AUTOGENERATED] rocm7.1_internal_testing_IFU_2025-08-22 #3
pragupta
wants to merge
722
commits into
rocm7.1_internal_testing
from
rocm7.1_internal_testing_IFU_2025-08-22
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
keep existing unbacked semantics unchanged, just use guard_or_false instead of guard_size_obl Pull Request resolved: pytorch#160250 Approved by: https://github.com/ColinPeppler, https://github.com/jingsh
Pull Request resolved: pytorch#160251 Approved by: https://github.com/jingsh, https://github.com/ColinPeppler ghstack dependencies: pytorch#160250
This reverts commit e0488d9. Reverted pytorch#160458 on behalf of https://github.com/wdvr due to need to rerun workflow generation (failing workflow-checks) ([comment](pytorch#160458 (comment)))
Which is manylinux2_28 compatible, even on aarch64 platform archive contents and URL pattern changed quite drastically between 3.3.9 and 3.3.20, but hopefully it still works. Package `libnvshmem_host.so.3` into gigantic aarch64+CUDA wheel Should fix pytorch#160425 Pull Request resolved: pytorch#160458 Approved by: https://github.com/Skylion007, https://github.com/kwen2501, https://github.com/nWEIdia, https://github.com/atalman, https://github.com/tinglvv
…ytorch#159790) This is a similar change to pytorch#153986, this time adding flags to the hipcc command under `cpp_extension.py`. The `-Wno-ignored-attributes` flag in particular avoids about 200MB of warning spam when building torchvision, like these: ``` In file included from D:\b\vision_main\torchvision\csrc\ops\hip\deform_conv2d_kernel.hip:72: In file included from D:\projects\TheRock\external-builds\pytorch\.venv\Lib\site-packages\torch\include\ATen/ATen.h:13: In file included from D:\projects\TheRock\external-builds\pytorch\.venv\Lib\site-packages\torch\include\ATen/Functions.h:386: In file included from D:\projects\TheRock\external-builds\pytorch\.venv\Lib\site-packages\torch\include\ATen/ops/_sparse_softmax.h:21: D:\projects\TheRock\external-builds\pytorch\.venv\Lib\site-packages\torch\include\ATen/ops/_sparse_softmax_ops.h:18:8: warning: __declspec attribute 'dllimport' is not supported [-Wignored-attributes] 18 | struct TORCH_API _sparse_softmax_int { | ^~~~~~~~~ D:\projects\TheRock\external-builds\pytorch\.venv\Lib\site-packages\torch\include\torch/headeronly/macros/Export.h:100:19: note: expanded from macro 'TORCH_API' 100 | #define TORCH_API C10_IMPORT | ^~~~~~~~~~ D:\projects\TheRock\external-builds\pytorch\.venv\Lib\site-packages\torch\include\torch/headeronly/macros/Export.h:53:31: note: expanded from macro 'C10_IMPORT' 53 | #define C10_IMPORT __declspec(dllimport) | ^~~~~~~~~ ``` The `-fms-extensions` flag just seems beneficial to include: https://clang.llvm.org/docs/MSVCCompatibility.html. See also this downstream issue where these changes were tested: ROCm/TheRock#910. Pull Request resolved: pytorch#159790 Approved by: https://github.com/jeffdaily
Summary: as title This is requested by the zoomer team so they can add stack trace information to profiler result. Test Plan: ``` buck run mode/dev-nosan fbcode//caffe2/test/inductor:provenance_tracing -- -r stack_traces ``` Rollback Plan: Differential Revision: D80050233 Pull Request resolved: pytorch#160779 Approved by: https://github.com/angelayi
) Typo mistake. This should be `dataclasses_json` https://github.com/pytorch/pytorch/actions/runs/17000197828/job/48200676725#step:10:23 Pull Request resolved: pytorch#160796 Approved by: https://github.com/yangw-dev
Pull Request resolved: pytorch#160698 Approved by: https://github.com/huydhn ghstack dependencies: pytorch#160116
This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/nightly.yml). Update the pinned vllm hash. Pull Request resolved: pytorch#160699 Approved by: https://github.com/pytorchbot
This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/nightly.yml). Update the pinned audio hash. Pull Request resolved: pytorch#160797 Approved by: https://github.com/pytorchbot
…pass (pytorch#158811) Pull Request resolved: pytorch#158811 Approved by: https://github.com/anijain2305 ghstack dependencies: pytorch#158810
Set dynamo=True and enable fallback. 1. Implemented the compatible behavior where BytesIO objects as `f` is accepted 2. Update tests to explicitly set dynamo=False pytorch#151693 Pull Request resolved: pytorch#159646 Approved by: https://github.com/titaiwangms
Fixes pytorch#160650. I added type ignore comment to `LeafSpec` class inheritance in `torch/utils/_cxx_pytree.py` to handle `PyTreeSpec` being marked as final in optree's type stubs. Pull Request resolved: pytorch#160652 Approved by: https://github.com/Skylion007
…0635) My proposal here is to use GitHub Dependabot to make sure that `transformers` version used in CI are always up-to-date. To achieve this, this PR does 2 things: 1. Pin `transformers` version across all CI jobs to only one place at `.ci/docker/ci_commit_pins/huggingface.txt`. This file is now a regular pip requirements instead of a pinned commit text. There isn't any need to pin `transformers` to a specific commit and the file already refers to a stable version `v4.54.0` 2. Create `.github/dependabot.yml` to config the bot to update `transformers` automatically when there is a new version. Those labels will ensure that the right reviewers from torch.compile and Dev Infra are notified. I'm not sure how to test this out in PR, but it feels ok to land and test this in main. If this works, we should see a PR to update `v4.54.0` to the current latest `v4.55.0` ### Reference https://docs.github.com/en/code-security/dependabot/working-with-dependabot/dependabot-options-reference Pull Request resolved: pytorch#160635 Approved by: https://github.com/ZainRizvi
… add aten.sym_is_contiguous. (pytorch#159197) This might cause some new DDEs on call sites that do not use is_contiguous_or_false() or sym_is_contiguous() but want to find those call sites to handle this properly by calling is_contiguous_or_false() and not is_contiguous() explitly when appropriate. I had to fix one issue after removing the implicit size oblivious reasoning. here is context we defined in this pytorch#157472 sym_is_contiguous to be the function computing contiguity for dynamic shapes in c++. It returns a symbolic expression that represents contiguity and guaranteed not to throw a DDE. when people call is_contiguous we do sym_is_contiguous().guard_bool() when people call is_contiguous_or_false we do sym_is_contiguous().guard_or_false() one issue not handled well was this path ``` c10::SymBool TensorImpl::sym_is_contiguous_custom( at::MemoryFormat memory_format) const { if (C10_UNLIKELY(matches_python_custom(SizesStridesPolicy::CustomStrides))) { return pyobj_slot_.load_pyobj_interpreter()->is_contiguous( this, memory_format); } return sym_is_contiguous_default(memory_format); } ``` namely if we call sym_is_contiguous_custom but we have matches_python_custom(SizesStridesPolicy::CustomStrides) return true , then we used to call is_contiguous(this, memory_format); This used to go through the load_pyobj_interpreter and end up calling the python is_contiguous call which used implicit size oblivious reasoning. once we removed that implicit size oblivious reasoning, the right thing we want is to call return pyobj_slot_.load_pyobj_interpreter()->sym_is_contiguous(this, memory_format); otherwise we would get DDE even if the caller is doing sym_is_contiguous. so I had to define it for pyinterpreter, and then I had to override it for nested tensors. Pull Request resolved: pytorch#159197 Approved by: https://github.com/ezyang
Differential Revision: D80201622 Pull Request resolved: pytorch#160599 Approved by: https://github.com/bdhirsh
…unner-mypy` (pytorch#160806) Like `MYPY`, linter `MYPYSTRICT` will need `--all-files` too. See also: - pytorch#160652 (comment) Pull Request resolved: pytorch#160806 Approved by: https://github.com/seemethere
Summary: - Add TLParse artifact logging per op with output tensor shape, stride, and dtype for cross-rank aggregation. Testing: - Add test to verify structure and contents of tlparse artifiact Pull Request resolved: pytorch#160132 Approved by: https://github.com/xmfan ghstack dependencies: pytorch#160260
pytorch#159902) Pull Request resolved: pytorch#159902 Approved by: https://github.com/mlazos ghstack dependencies: pytorch#159365, pytorch#159366, pytorch#159368, pytorch#159483
…ytorch#159864) Pull Request resolved: pytorch#159864 Approved by: https://github.com/mlazos ghstack dependencies: pytorch#159365, pytorch#159366, pytorch#159368, pytorch#159483, pytorch#159902
…ts (pytorch#159865) Changes: (1) Replace UserDefinedSetVariable by UserDefinedObjectVariable in all binop calls Test plan: (1) The three tests from CPython `test_collections.py` ensures that Dynamo can trace through a dunder method (e.g. __add__, __ixor__, etc) defined in a user defined class Pull Request resolved: pytorch#159865 Approved by: https://github.com/mlazos ghstack dependencies: pytorch#159365, pytorch#159366, pytorch#159368, pytorch#159483, pytorch#159902, pytorch#159864
…0132)" This reverts commit 2603e40. Reverted pytorch#160132 on behalf of https://github.com/clee2000 due to broke lint [GH job link](https://github.com/pytorch/pytorch/actions/runs/17010600949/job/48226137423) [HUD commit link](https://hud.pytorch.org/pytorch/pytorch/commit/2603e40be5fa4a66301e6654e34a82a67f2e4913). landrace with another PR that changed some had_cuda related things ([comment](pytorch#160132 (comment)))
…#160747) Summary: Inductor's 3.4 Triton release is the most common used variant of Triton, but if someone is working with an alternative version of Triton this may not match. This moves the version check from 3.4 Triton to any variant that has support for the TMA APIs. Test Plan: Testing the previously failing test `inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_welford_non_block_pointer_cuda` Rollback Plan: Differential Revision: D80348643 Pull Request resolved: pytorch#160747 Approved by: https://github.com/NikhilAPatel
This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/nightly.yml). Update the pinned vllm hash. Pull Request resolved: pytorch#160831 Approved by: https://github.com/pytorchbot
Summary: - Add TLParse artifact logging per op with output tensor shape, stride, and dtype for cross-rank aggregation. Testing: - Add test to verify structure and contents of tlparse artifiact Pull Request resolved: pytorch#160132 Approved by: https://github.com/xmfan
To a commit containing pytorch/tensorpipe#464 that fixes compilation with CUDA-13 Fixes pytorch#160104 Pull Request resolved: pytorch#160808 Approved by: https://github.com/nWEIdia, https://github.com/Skylion007, https://github.com/malfet
…ytorch#160747)" This reverts commit 8f43454. Reverted pytorch#160747 on behalf of https://github.com/malfet due to Looks like this breaks rocm, see https://hud.pytorch.org/hud/pytorch/pytorch/main/1?per_page=50&name_filter=rocm%20%2F%20linux-jammy-rocm-py3.10 ([comment](pytorch#160747 (comment)))
Remove CONDA_CMAKE from `.ci/docker/build.sh` Pull Request resolved: pytorch#160832 Approved by: https://github.com/malfet
Purely a refactor, improve typing and get rid of some type errors. Make certain fields as nonnull, since in general it's not empty. The goal of this stack of PRs is to move the save/load logic of guard serialization into separate, flat phases, instead of being embedded in guard creation. This way, we can put a try/catch around it and fail safely if certain guards are not serializable. Pull Request resolved: pytorch#160530 Approved by: https://github.com/Lucaskabela, https://github.com/Skylion007
Because numpy 1.22.4 had reached EOL 3 years ago. Pull Request resolved: pytorch#160836 Approved by: https://github.com/malfet
…addmm (pytorch#155357)" This reverts commit ce048de. Reverted pytorch#155357 on behalf of https://github.com/seemethere due to This is causing buck builds to fail since we didn't add the definition of AT_USE_EIGEN_SPARSE in the buckbuild.bzl file, will follow-up and re-land this. ([comment](pytorch#155357 (comment)))
Bumps [uv](https://github.com/astral-sh/uv) from 0.8.4 to 0.8.6. - [Release notes](https://github.com/astral-sh/uv/releases) - [Changelog](https://github.com/astral-sh/uv/blob/main/CHANGELOG.md) - [Commits](astral-sh/uv@0.8.4...0.8.6) --- updated-dependencies: - dependency-name: uv dependency-version: 0.8.6 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…ch#160205) Parallelize reading of data behind thread_count argument to HFStorageReader Test plan: ensure existing tests pass and run a job successfully with these changes Differential Revision: [D79478188](https://our.internmc.facebook.com/intern/diff/D79478188/) Pull Request resolved: pytorch#160205 Approved by: https://github.com/meetv18
Summary: att - changed one of the tests to get rid of torcharrow dep. Test Plan: ``` buck2 test //caffe2/test/cpp/nativert:layout_planner_tests Tests finished: Pass 15. Fail 0. Fatal 0. Skip 0. Build failure 0 ``` Rollback Plan: Reviewed By: SherlockNoMad Differential Revision: D80108549 Pull Request resolved: pytorch#160942 Approved by: https://github.com/georgiaphillips, https://github.com/henryoier
This fixes an assertion we were running into in the memory planning about not having an acyclic graph. The repro is very long so hard to make local test of, but fixes repro I am looking at. Pull Request resolved: pytorch#161205 Approved by: https://github.com/IvanKobzarev, https://github.com/bdhirsh
…61185) Summary: Removed `Model`, it's not being used anywhere so it's safe. Removed `tensor_paths` and `constant_paths` fields in `ExportedProgram` - BC: when the current deserializer load a previously serialized EP (that comes with empty `tensor_paths` and `constant_paths`), it will just ignore those two fields - FC: when the old deserializer load a newly serialized EP (that doesn't come with `tensor_paths` and `constant_paths`, it will also ignore those two fields in `_dict_to_dataclass()` Differential Revision: D80725094 Pull Request resolved: pytorch#161185 Approved by: https://github.com/SherlockNoMad
Pull Request resolved: pytorch#161168 Approved by: https://github.com/mikaylagawarecki, https://github.com/Skylion007
…ytorch#160373) Following up on pytorch#152951 (comment), this removes a few lines added in that pull request, fixing link errors like ``` [7019/7028] Linking CXX shared library bin\torch_hip.dll FAILED: [code=4294967295] bin/torch_hip.dll lib/torch_hip.lib C:\Windows\system32\cmd.exe /C "cd . && D:\projects\TheRock\external-builds\pytorch\3.12.venv\Lib\site-packages\cmake\data\bin\cmake.exe -E vs_link_dll --msvc-ver=1942 --intdir=caffe2\CMakeFiles\torch_hip.dir --rc=C:\PROGRA~2\WI3CF2~1\10\bin\100261~1.0\x64\rc.exe --mt=C:\PROGRA~2\MICROS~2\2022\BUILDT~1\VC\Tools\Llvm\x64\bin\llvm-mt.exe --manifests -- D:\projects\TheRock\external-builds\pytorch\3.12.venv\Lib\site-packages\_rocm_sdk_devel\lib\llvm\bin\lld-link.exe /nologo @CMakeFiles\torch_hip.rsp /out:bin\torch_hip.dll /implib:lib\torch_hip.lib /pdb:bin\torch_hip.pdb /dll /version:0.0 /machine:x64 /ignore:4049 /ignore:4217 /ignore:4099 /INCREMENTAL:NO && cd ." LINK: command "D:\projects\TheRock\external-builds\pytorch\3.12.venv\Lib\site-packages\_rocm_sdk_devel\lib\llvm\bin\lld-link.exe /nologo @CMakeFiles\torch_hip.rsp /out:bin\torch_hip.dll /implib:lib\torch_hip.lib /pdb:bin\torch_hip.pdb /dll /version:0.0 /machine:x64 /ignore:4049 /ignore:4217 /ignore:4099 /INCREMENTAL:NO /MANIFEST:EMBED,ID=2" failed (exit code 1) with the following output: lld-link: error: undefined symbol: __declspec(dllimport) class std::tuple<class at::Tensor, class at::Tensor, class at::Tensor> __cdecl at::native::transform_bias_rescale_qkv_cuda(class at::Tensor const &, class at::Tensor const &, __int64) >>> referenced by caffe2\CMakeFiles\torch_hip.dir\__\aten\src\ATen\RegisterCUDA_0.cpp.obj:(class std::tuple<class at::Tensor, class at::Tensor, class at::Tensor> __cdecl at::`anonymous namespace'::`anonymous namespace'::wrapper_CUDA___transform_bias_rescale_qkv(class 0xE9BF7323::Tensor const &, class 0xE9BF7323::Tensor const &, __int64)) >>> referenced by caffe2\CMakeFiles\torch_hip.dir\__\aten\src\ATen\RegisterNestedTensorCUDA_0.cpp.obj:(class std::tuple<class at::Tensor, class at::Tensor, class at::Tensor> __cdecl at::`anonymous namespace'::`anonymous namespace'::wrapper_NestedTensorCUDA___transform_bias_rescale_qkv(class 0xEFEB5304::Tensor const &, class 0xEFEB5304::Tensor const &, __int64)) ``` The `native_transformers_hip_hip` and `native_transformers_hip_cpp` sources are okay to define (and are required) even if accelerated versions of these operations are not available. I've tested downstream builds of torch with ROCm on native Windows via https://github.com/ROCm/TheRock both with and without aotriton and these changes were needed for the build to succeed in both cases. I have _not_ tested Linux, WSL, or with the HIP SDK. Pull Request resolved: pytorch#160373 Approved by: https://github.com/alugorey, https://github.com/jeffdaily
Note: Adding unit test for this is tricky as having errors in the specific unit test would cause test_utils.py to crash all together. Tested as follows: 1. Added x = 1/0 after guarded_code = compile_inner(code, one_graph, hooks, transform) in convert_frame.py 2. Printed exception_stack_trace and got: ['Traceback (most recent call last):\n File "/data/users/jovian/pytorch/torch/_dynamo/convert_frame.py", line 1207, in _compile\n x = 1/0\n ~^~\nZeroDivisionError: division by zero\n'] Pull Request resolved: pytorch#161096 Approved by: https://github.com/c00w
…59233) Fixes pytorch#158076 Basically, the gemm template generates code like ``` cpp_CppMicroGemmRef_micro_gemm<static_cast<bool>(false), static_cast<bool>(false)>( &(X[static_cast<int64_t>(k_start + 196LL*m_start + 38416LL*ks_b_index)]), &(W[static_cast<int64_t>(200704000LL + n_start + 80LL*k_start + 15680LL*ks_b_index)]), &(local_acc_buf[static_cast<int64_t>(Nr*nci + ((-1LL)*Nr*nc))]), static_cast<int64_t>(m_end + ((-1LL)*m_start)), static_cast<int64_t>(Nr), static_cast<int64_t>(k_end + ((-1LL)*k_start)), static_cast<int64_t>(196LL), static_cast<int64_t>(80LL), static_cast<int64_t>(Nc_blocks*Nr) ); ``` However, when the input tensor W has a storage offset, this results in a double offset issue. That is, the resulting pointer is `2 * 200704000LL` away from `W.storage().data_ptr()`, which causes an out-of-bounds access. The storage offset of `W` is introduced by [this patch](https://github.com/pytorch/pytorch/pull/136421/files), but I think it's a reasonable fix. So `cpp_gemm_template.py` should handle input matrices with storage offsets properly. I think a good way to fix this issue is to create a new matrix that has no storage offset. When `should_block_weights` is true, `block_weight()` creates a clean new matrix, so that branch is not affected by this issue. BTW I've also examined the FX IRs generated by `torch.compile()`, as well as the generated python module, and they are correct. The newly-added test in `test_cpu_select_algorithm.py` can reproduce the issue. With this patch, the crash is fixed. It also resolves the crash reported in pytorch#158076. I ran CPU tests in `test_cpu_select_algorithm.py`, but many of them are skipped due to MKL and AMX. I'd be appreciated if someone can help verify the test. Pull Request resolved: pytorch#159233 Approved by: https://github.com/leslie-fang-intel, https://github.com/swolchok
…#161203) Summary: We use tempfile.NamedTemporaryFile to create a temporary pt2 file in `test_nativert.py` However, it is not recognized as an allowed file format and a warning will be thrown. Test Plan: CI Rollback Plan: Differential Revision: D80740916 Pull Request resolved: pytorch#161203 Approved by: https://github.com/angelayi
This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/nightly.yml). Update the pinned audio hash. Pull Request resolved: pytorch#161226 Approved by: https://github.com/pytorchbot
…orch#161036) Fixes silent incorrectness for autograd function tracing, where we rely on FakeTensor metadata (requires_grad) to determine whether to HOP or not: https://github.com/pytorch/pytorch/blob/5ee464db5c4293ac09521f9069fa7d2106680a7f/torch/_dynamo/variables/misc.py#L671 Stared at this with @anijain2305 yesterday, `Tensor.__setitem__` can update tensor metadata, and we can just run the fake prop and extract the output metadata from the updated FakeTensor. FIXES pytorch#160901 It should also be the root cause behind the issue in pytorch/torchtitan#1604 @bdhirsh @ruisizhang123 Pull Request resolved: pytorch#161036 Approved by: https://github.com/anijain2305 ghstack dependencies: pytorch#160805
Pull Request resolved: pytorch#160583 Approved by: https://github.com/huydhn, https://github.com/atalman
…rch#161137) This doesn't make sense to have this default to Maxwell, which is too old. All other places in CI/CD needs to overwrite this value. IMO, it makes more sense to not set this at all and let CI/CD jobs set it for their own use cases instead. This is partly responsible for the build failure in pytorch#160988 Pull Request resolved: pytorch#161137 Approved by: https://github.com/msaroufim
Optimize [zero_grad doc](https://docs.pytorch.org/docs/stable/generated/torch.optim.Optimizer.zero_grad.html) format and description. ## Test Result ### Before <img width="996" height="534" alt="image" src="https://github.com/user-attachments/assets/e1db973c-57e8-4525-90e7-0500cde2263d" /> ### After <img width="890" height="496" alt="image" src="https://github.com/user-attachments/assets/5579c4fb-a857-4030-9303-34770083d1a5" /> Pull Request resolved: pytorch#161239 Approved by: https://github.com/janeyx99
…#161196) Enable max compatible to msvc for oneAPI headers. The key context is `The /permissive- option is compatible with almost all of the header files from the latest Windows Kits` from https://learn.microsoft.com/en-us/cpp/build/reference/permissive-standards-conformance?view=msvc-170 Pull Request resolved: pytorch#161196 Approved by: https://github.com/jansel
Changes: 1. Math related build option is not supported by msvc, skip them on Windows. 2. Move all math related build option to `_get_ffast_math_flags` function. Pull Request resolved: pytorch#161197 Approved by: https://github.com/jansel
…orch#161159) Pull Request resolved: pytorch#161159 Approved by: https://github.com/eellison
# Motivation pytorch#160505 enables background threads for XPU host allocator. However, it will hang on Windows during program exit. Now disable it until we narrow down the issue. Pull Request resolved: pytorch#161242 Approved by: https://github.com/EikanWang
Removes a redundant if statement. Does not impact logic so no test changes needed. Pull Request resolved: pytorch#161215 Approved by: https://github.com/StrongerXi
…58568) Adds support for FlightRecorder in ProcessGroupXCCL. See intel/torch-xpu-ops#1867 for XCCL implementation and more details. Pull Request resolved: pytorch#158568 Approved by: https://github.com/guangyey, https://github.com/fduwjj
…#161043) As the title stated. Pull Request resolved: pytorch#161043 Approved by: https://github.com/Skylion007
Pull Request resolved: pytorch#159361 Approved by: https://github.com/anijain2305
Add magma build 13.0 for Windows Add cuda_install.bat 13.0 for Windows build pytorch#159779 Pull Request resolved: pytorch#161073 Approved by: https://github.com/atalman Co-authored-by: Andrey Talman <[email protected]>
pytorch#159779 CUDA 13.0.0 NVSHMEM 3.3.20 CUDNN 9.12.0.46 Adding x86 linux builds for CUDA 13. Adding libtorch docker. Package naming changed for CUDA 13 (removed postfix -cu13 for some packages). Preparation checklist: 1. Update index https://download.pytorch.org/whl/nightly/cu130 with pypi packages 2. Update packaging name based on https://pypi.org/project/cuda-toolkit/ metadata Pull Request resolved: pytorch#160956 Approved by: https://github.com/atalman Co-authored-by: atalman <[email protected]>
This reverts commit 523bffd. Reverted pytorch#149218 on behalf of https://github.com/atalman due to Lets not use no-cache flags on test binaries ([comment](pytorch#149218 (comment)))
…sting_IFU_2025-08-22 # Conflicts: # .ci/docker/requirements-ci.txt # aten/src/ATen/Context.cpp # aten/src/ATen/cuda/tunable/GemmHipblaslt.h # aten/src/ATen/native/Normalization.cpp # aten/src/ATen/native/cuda/Blas.cpp # requirements.txt # test/distributed/_tools/test_fsdp2_mem_tracker.py # test/dynamo/test_activation_checkpointing.py # test/dynamo/test_structured_trace.py # test/inductor/test_combo_kernels.py # test/test_matmul_cuda.py # torch/_higher_order_ops/triton_kernel_wrap.py # torch/_inductor/choices.py # torch/_inductor/codegen/triton.py # torch/testing/_internal/common_cuda.py
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Merged latest changes from upstream/main into rocm7.1_internal_testing on 2025-08-22