forked from pytorch/pytorch
-
Notifications
You must be signed in to change notification settings - Fork 7
Closed
Labels
Description
🐛 Describe the bug
codegen error generated from simple view+native_layer_norm_backward.
Repro script:
import torch
inps = [(torch.Size([768]), torch.float32), (torch.Size([768]), torch.float32), (torch.Size([4, 512, 768]), torch.float32), (torch.Size([4, 512, 1]), torch.float32), (torch.Size([4, 512, 1]), torch.float32), (torch.Size([2048, 768]), torch.float32)]
inps = [torch.ones(shape, dtype=dtype, device='cuda') for (shape, dtype) in inps]
def forward(primals_5, primals_6, primals_15, getitem_2, getitem_1, mm_6):
view_19 = torch.ops.aten.view(mm_6, [4, 512, 768])
native_layer_norm_backward_1 = torch.ops.aten.native_layer_norm_backward(view_19, primals_15, [768], getitem_1, getitem_2, primals_6, primals_5, [True, True, True])
return (native_layer_norm_backward_1,)
f = torch.jit.script(forward)
with torch.jit.fuser("fuser2"):
for _ in range(5):
f(*inps)
Run script with PYTORCH_NVFUSER_DISABLE_FALLBACK=1
Generated error log:
CUDA NVRTC compile error: default_program(3358): error: identifier "i11" is undefined
where i11
was used earlier in a predicate
if (((i221 < T8.size[2]) && (((((((((nvfuser_index_t)blockIdx.y) * (ceilDiv((ceilDiv((ceilDiv((ceilDiv((T8.size[0] * T8.size[1]), ((nvfuser_index_t)blockDim.y))), 4)), 1)), ((nvfuser_index_t)gridDim.y)))) + i174) * 4) + (i179 + nvfuser_zero)) * ((nvfuser_index_t)blockDim.y)) + ((nvfuser_index_t)threadIdx.y)) < (4 * (ceilDiv(i11, 4)))))) {
full log attached
Versions
Reproed on ToT devel. Looks like a real issue.