Skip to content

[CD] Windows Wheel builds CUDA 12.9.1 Stack Overflow during build #156181

Open
@atalman

Description

@atalman

🐛 Describe the bug

PR: #155748

We are observing errors: LLVM ERROR: out of memory, nvcc error : '""%CICC_PATH%\cicc"' died with status 0xC0000409

6888/7628] Building CUDA object caffe2\CMakeFiles\torch_cuda.dir\__\aten\src\ATen\native\cuda\SegmentReduce.cu.obj
FAILED: caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/SegmentReduce.cu.obj 
C:\actions-runner\_work\pytorch\pytorch\pytorch\.ci\pytorch\windows\\tmp_bin\randomtemp.exe C:\actions-runner\_work\pytorch\pytorch\pytorch\.ci\pytorch\windows\\tmp_bin\sccache.exe C:\PROGRA~1\NVIDIA~2\CUDA\v12.9\bin\nvcc.exe -forward-unknown-to-host-compiler -DAT_PER_OPERATOR_HEADERS -DEXPORT_AOTI_FUNCTIONS -DFMT_HEADER_ONLY=1 -DIDEEP_USE_MKL -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DNOMINMAX -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DTORCH_CUDA_BUILD_MAIN_LIB -DTORCH_CUDA_USE_NVTX3 -DUSE_C10D_GLOO -DUSE_CUDA -DUSE_DISTRIBUTED -DUSE_EXTERNAL_MZCRC -DUSE_MEM_EFF_ATTENTION -DUSE_MIMALLOC -DWIN32_LEAN_AND_MEAN -D_CRT_SECURE_NO_DEPRECATE=1 -D_UCRT_LEGACY_INFINITY -Dtorch_cuda_EXPORTS -IC:\actions-runner\_work\pytorch\pytorch\pytorch\build\aten\src -IC:\actions-runner\_work\pytorch\pytorch\pytorch\aten\src -IC:\actions-runner\_work\pytorch\pytorch\pytorch\build -IC:\actions-runner\_work\pytorch\pytorch\pytorch -IC:\actions-runner\_work\pytorch\pytorch\pytorch\nlohmann -IC:\actions-runner\_work\pytorch\pytorch\pytorch\moodycamel -IC:\actions-runner\_work\pytorch\pytorch\pytorch\third_party\mimalloc\include -IC:\actions-runner\_work\pytorch\pytorch\pytorch\aten\src\THC -IC:\actions-runner\_work\pytorch\pytorch\pytorch\aten\src\ATen\cuda -IC:\actions-runner\_work\pytorch\pytorch\pytorch\third_party\fmt\include -IC:\actions-runner\_work\pytorch\pytorch\pytorch\aten\src\ATen\..\..\..\third_party\cutlass\include -IC:\actions-runner\_work\pytorch\pytorch\pytorch\aten\src\ATen\..\..\..\third_party\cutlass\tools\util\include -IC:\actions-runner\_work\pytorch\pytorch\pytorch\build\caffe2\aten\src -IC:\actions-runner\_work\pytorch\pytorch\pytorch\aten\src\ATen\.. -IC:\actions-runner\_work\pytorch\pytorch\pytorch\c10\cuda\..\.. -IC:\actions-runner\_work\pytorch\pytorch\pytorch\c10\.. -IC:\actions-runner\_work\pytorch\pytorch\pytorch\torch\csrc\api -IC:\actions-runner\_work\pytorch\pytorch\pytorch\torch\csrc\api\include -isystem C:\actions-runner\_work\pytorch\pytorch\pytorch\build\third_party\gloo -isystem C:\actions-runner\_work\pytorch\pytorch\pytorch\cmake\..\third_party\gloo -isystem C:\actions-runner\_work\pytorch\pytorch\pytorch\cmake\..\third_party\googletest\googlemock\include -isystem C:\actions-runner\_work\pytorch\pytorch\pytorch\cmake\..\third_party\googletest\googletest\include -isystem C:\actions-runner\_work\pytorch\pytorch\pytorch\third_party\protobuf\src -isystem C:\actions-runner\_work\pytorch\pytorch\pytorch\.ci\pytorch\windows\Python\Library\include -isystem C:\actions-runner\_work\pytorch\pytorch\pytorch\third_party\XNNPACK\include -isystem C:\actions-runner\_work\pytorch\pytorch\pytorch\third_party\ittapi\include -isystem C:\actions-runner\_work\pytorch\pytorch\pytorch\cmake\..\third_party\eigen -isystem "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.9\include" -isystem C:\actions-runner\_work\pytorch\pytorch\pytorch\third_party\ideep\mkl-dnn\include\oneapi\dnnl -isystem C:\actions-runner\_work\pytorch\pytorch\pytorch\third_party\ideep\include -isystem C:\actions-runner\_work\pytorch\pytorch\pytorch\INTERFACE -isystem C:\actions-runner\_work\pytorch\pytorch\pytorch\third_party\nlohmann\include -isystem C:\actions-runner\_work\pytorch\pytorch\pytorch\third_party\concurrentqueue -isystem C:\actions-runner\_work\pytorch\pytorch\pytorch\third_party\NVTX\c\include -isystem C:\actions-runner\_work\pytorch\pytorch\pytorch\cmake\..\third_party\cudnn_frontend\include -isystem C:\actions-runner\_work\pytorch\pytorch\pytorch\.ci\pytorch\windows\magma_cuda129_release\include -DLIBCUDACXX_ENABLE_SIMPLIFIED_COMPLEX_OPERATIONS -Xcompiler  /Zc:__cplusplus -Xcompiler /w -w -Xcompiler /FS -Xfatbin -compress-all -DONNX_NAMESPACE=onnx_torch --use-local-env -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_100,code=sm_100 -gencode arch=compute_120,code=sm_120 -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --Werror cross-execution-space-call --no-host-device-move-forward --expt-relaxed-constexpr --expt-extended-lambda -Xfatbin -compress-all -Xcompiler=/wd4819,/wd4503,/wd4190,/wd4244,/wd4251,/wd4275,/wd4522 -Wno-deprecated-gpu-targets --expt-extended-lambda -DCUB_WRAPPED_NAMESPACE=at_cuda_detail -DCUDA_HAS_FP16=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -Xcompiler="-O2 -Ob2" -DNDEBUG -Xcompiler /MD -std=c++17 -Xcompiler=-MD -Xcompiler=-Z7 -DMKL_HAS_SBGEMM -DMKL_HAS_SHGEMM -DCAFFE2_USE_GLOO -MD -MT caffe2\CMakeFiles\torch_cuda.dir\__\aten\src\ATen\native\cuda\SegmentReduce.cu.obj -MF caffe2\CMakeFiles\torch_cuda.dir\__\aten\src\ATen\native\cuda\SegmentReduce.cu.obj.d -x cu -c C:\actions-runner\_work\pytorch\pytorch\pytorch\aten\src\ATen\native\cuda\SegmentReduce.cu -o caffe2\CMakeFiles\torch_cuda.dir\__\aten\src\ATen\native\cuda\SegmentReduce.cu.obj -Xcompiler=-Fdcaffe2\CMakeFiles\torch_cuda.dir\,-FS
LLVM ERROR: out of memory
SegmentReduce.cu
nvcc error   : '""%CICC_PATH%\cicc"' died with status 0xC0000409 
Retry attempt: 1
LLVM ERROR: out of memory
SegmentReduce.cu
nvcc error   : '""%CICC_PATH%\cicc"' died with status 0xC0000409 
Retry attempt: 2
LLVM ERROR: out of memory
SegmentReduce.cu
nvcc error   : '""%CICC_PATH%\cicc"' died with status 0xC0000409 
Retry attempt: 3
LLVM ERROR: out of memory
SegmentReduce.cu
nvcc error   : '""%CICC_PATH%\cicc"' died with status 0xC0000409 

.....
ninja: build stopped: subcommand failed.
-- Building version 2.8.0.dev20250617+cu129
-- Checkout nccl release tag: v2.27.3-1
cmake -GNinja -DBUILD_ENVIRONMENT=windows-binary-wheel -DBUILD_PYTHON=True -DBUILD_PYTHONLESS= -DBUILD_TEST=True -DBUILD_TYPE=release -DCMAKE_BUILD_TYPE=Release -DCMAKE_CUDA_COMPILER=C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.9/bin/nvcc.exe -DCMAKE_CUDA_COMPILER_LAUNCHER=C:\actions-runner\_work\pytorch\pytorch\pytorch\.ci\pytorch\windows\\tmp_bin\randomtemp.exe;C:\actions-runner\_work\pytorch\pytorch\pytorch\.ci\pytorch\windows\\tmp_bin\sccache.exe -DCMAKE_GENERATOR=Ninja -DCMAKE_INSTALL_PREFIX=C:\actions-runner\_work\pytorch\pytorch\pytorch\torch -DCMAKE_PREFIX_PATH=C:\actions-runner\_work\pytorch\pytorch\pytorch\.ci\pytorch\windows\Python\Lib\site-packages;C:\actions-runner\_work\pytorch\pytorch\pytorch\.ci\pytorch\windows\Python\Library\;C:\actions-runner\_work\pytorch\pytorch\pytorch\.ci\pytorch\windows\Python\Lib\site-packages\cmake\data\bin;C:\actions-runner\_work\pytorch\pytorch\pytorch\.ci\pytorch\windows\Python\Scripts;C:\actions-runner\_work\pytorch\pytorch\pytorch\.ci\pytorch\windows\Python;C:\actions-runner\_work\pytorch\pytorch\pytorch\.ci\pytorch\windows\Python\Scripts;C:\actions-runner\_work\pytorch\pytorch\pytorch\.ci\pytorch\windows\Python;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.9\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.9\libnvvp;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0;C:\Windows\System32\OpenSSH;C:\Program Files\Amazon\cfn-bootstrap;C:\ProgramData\chocolatey\bin;C:\Program Files\Amazon\AWSCLIV2;C:\Program Files\Git\cmd;C:\Program Files\Git\mingw64\bin;C:\Program Files\Git\usr\bin;C:\Program Files (x86)\Windows Kits\10\Windows Performance Toolkit;C:\Users\runneruser\AppData\Local\Microsoft\WindowsApps -DCUDA_NVCC_EXECUTABLE=C:\actions-runner\_work\pytorch\pytorch\pytorch\.ci\pytorch\windows\/tmp_bin/nvcc.bat -DINSTALL_TEST=0 -DPython_EXECUTABLE=C:\actions-runner\_work\pytorch\pytorch\pytorch\.ci\pytorch\windows\Python\python.exe -DTORCH_BUILD_VERSION=2.8.0.dev20250617+cu129 -DTORCH_CUDA_ARCH_LIST=7.5;8.0;8.6;9.0;10.0;12.0 -DUSE_FBGEMM=1 -DUSE_GLOO_WITH_OPENSSL=ON -DUSE_GOLD_LINKER=OFF -DUSE_NUMPY=True -DUSE_SCCACHE=1 -DUSE_SPLIT_BUILD= C:\actions-runner\_work\pytorch\pytorch\pytorch
cmake --build . --target install --config Release -j 12

Looks like we are experiencing this issue: https://discuss.pytorch.org/t/segment-reduce-memory-problems-while-building/220488

Versions

2.8.0

cc @malfet @seemethere @peterjc123 @mszhanyi @skyline75489 @nbcsm @iremyux @Blackhex @ptrblck @msaroufim @eqy @jerryzh168

Metadata

Metadata

Assignees

No one assigned

    Labels

    module: buildBuild system issuesmodule: cudaRelated to torch.cuda, and CUDA support in generalmodule: third_partymodule: windowsWindows support for PyTorchtriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions