Skip to content

slow test failure: test_nvfuser_extremal_values_masked_amin_cuda_float32 #2169

@zasdfgbnm

Description

@zasdfgbnm

🐛 Describe the bug

To reproduce:

PYTORCH_TEST_WITH_SLOW=1 python test/test_jit_cuda_fuser.py -v -k test_nvfuser_extremal_values_masked_amin_cuda_float32

Fusion math is:

Inputs:
  T0_g[ iS0{i1}, iS1{i2}, bS2{1}, iS3{i4} ], float
  T1_g[ 0 ], float
  T2_g[ iS4{i5}, iS5{i6}, bS6{1}, iS7{i7} ], bool
Outputs:
  T9_g[ 0 ], float

%kernel_math {
T3_l[ iS8{i5}, iS9{i6}, bS10{1}, iS11{i7} ]
   = T2_g[ iS4{i5}, iS5{i6}, bS6{1}, iS7{i7} ];
T4_l[ iS12{i5}, iS13{i6}, bS14{1}, iS15{i7} ]
   = T3_l[ iS8{i5}, iS9{i6}, bS10{1}, iS11{i7} ];
T5_l[ bS16{1}, bS17{1}, bS18{1}, bS19{1} ]
   = broadcast( T1_g[ 0 ] )
T6_l[ iS20{i5}, iS21{i6}, bS22{1}, iS23{i7} ]
   = where(T4_l[ iS12{i5}, iS13{i6}, bS14{1}, iS15{i7} ]
  , T0_g[ iS0{i1}, iS1{i2}, bS2{1}, iS3{i4} ]
  , T5_l[ bS16{1}, bS17{1}, bS18{1}, bS19{1} ]);
T7_l[ iS24{i5}, iS25{i6}, iS26{i7} ]
   = squeeze( T6_l[ iS20{i5}, iS21{i6}, bS22{1}, iS23{i7} ] )
T8_l[ rS27{i5}, rS28{i6}, rS29{i7} ]
   = reduction( T7_l[ iS24{i5}, iS25{i6}, iS26{i7} ], op = fmin, initial value = double(inf), allreduce = false )
T9_g[ 0 ]
   = T8_l[ rS27{i5}, rS28{i6}, rS29{i7} ];
}

but the given input T2 has shape (3, 2, 1, 1). The last dim was not correctly marked as broadcast, so I think during codegen, our system is assuming i4 == i7 and generating code based on that.

Versions

TOT devel

Metadata

Metadata

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions