-
Notifications
You must be signed in to change notification settings - Fork 7
Description
🐛 Describe the bug
The permissive mapping implementation today could use some further formalization to make it easier and more robust to use.
One counter-intuitive behavior that was addressed in #1960 is that exact
mapping does not imply permissive
mapping, which could make the indexing implementation error prone.
Repro:
In test case NVFuserTest.FusionMappingRelation_CUDA
%kernel {
T2_l[ bS17{( 1 * 1 )} ]
= T0_g[ bS18{( 1 * 1 )} ];
T3_l[ bS16{( 1 * ( 1 * 1 ) )} ]
= broadcast( T2_l[ bS17{( 1 * 1 )} ] )
T4_g[ iS14{( i4 * ( 1 * 1 ) )} ]
= T3_l[ bS16{( 1 * ( 1 * 1 ) )} ]
+ T1_g[ iS20{( i4 * ( 1 * 1 ) )} ];
TransformPrinter :
T0_g[ bS18{( 1 * 1 )} ]
root domain : (bS0{1},bS1{1})
Merge: bS0{1} and bS1{1} -> bS18{( 1 * 1 )}
T2_l[ bS17{( 1 * 1 )} ]
root domain : (bS5{1},bS6{1})
Merge: bS5{1} and bS6{1} -> bS17{( 1 * 1 )}
T3_l[ bS16{( 1 * ( 1 * 1 ) )} ]
root domain : (bS7{1},bS8{1},bS9{1})
Merge: bS8{1} and bS9{1} -> bS15{( 1 * 1 )}
Merge: bS7{1} and bS15{( 1 * 1 )} -> bS16{( 1 * ( 1 * 1 ) )}
T1_g[ iS20{( i4 * ( 1 * 1 ) )} ]
root domain : (iS2{i4},bS3{1},bS4{1})
Merge: bS3{1} and bS4{1} -> bS19{( 1 * 1 )}
Merge: iS2{i4} and bS19{( 1 * 1 )} -> iS20{( i4 * ( 1 * 1 ) )}
T4_g[ iS14{( i4 * ( 1 * 1 ) )} ]
root domain : (iS10{i4},bS11{1},bS12{1})
Merge: bS11{1} and bS12{1} -> bS13{( 1 * 1 )}
Merge: iS10{i4} and bS13{( 1 * 1 )} -> iS14{( i4 * ( 1 * 1 ) )}
}
Permissive map:
{iS21{T1.size[0]}*; bS7{1}; iS22{T1.size[0]} }
{bS11{1}*; bS8{1}; bS5{1}; bS0{1}; bS3{1} }
{bS12{1}*; bS9{1}; bS6{1}; bS1{1}; bS4{1} }
{bS13{( 1 * 1 )}*; bS15{( 1 * 1 )}; bS19{( 1 * 1 )} }
{iS14{( T1.size[0] * ( 1 * 1 ) )}*; bS16{( 1 * ( 1 * 1 ) )}; bS17{( 1 * 1 )}; bS18{( 1 * 1 )}; iS20{( T1.size[0] * ( 1 * 1 ) )} }
Exact map:
{bS7{1}* }
{bS16{( 1 * ( 1 * 1 ) )}* }
{iS21{T1.size[0]}*; iS22{T1.size[0]} }
{bS11{1}*; bS8{1}; bS5{1}; bS0{1}; bS3{1} }
{bS12{1}*; bS9{1}; bS6{1}; bS1{1}; bS4{1} }
{bS13{( 1 * 1 )}*; bS15{( 1 * 1 )}; bS17{( 1 * 1 )}; bS18{( 1 * 1 )}; bS19{( 1 * 1 )} }
{iS14{( T1.size[0] * ( 1 * 1 ) )}*; iS20{( T1.size[0] * ( 1 * 1 ) )} }
bS17
is exact mapped to bS13
but is not permissive mapped to bS13
, which was the part that confused the indexing pass.
Semantically, depending on how we want to define permissive mapping, it might make sense to say all of bS13, iS10, and iS14
permissively mapped to bS17, but we might lose precision and consistency quite quickly once we start to map iterdomains on the same tensordomain to each other. This would be the formalization part that'd need some follow up work.
[TODO] another example to come as well.
Versions
NVFuser TOT