Skip to content

Formalization on permissive mapping #1963

@shmsong

Description

@shmsong

🐛 Describe the bug

The permissive mapping implementation today could use some further formalization to make it easier and more robust to use.

One counter-intuitive behavior that was addressed in #1960 is that exact mapping does not imply permissive mapping, which could make the indexing implementation error prone.

Repro:

In test case NVFuserTest.FusionMappingRelation_CUDA

%kernel {
T2_l[ bS17{( 1 * 1 )} ]
   = T0_g[ bS18{( 1 * 1 )} ];
T3_l[ bS16{( 1 * ( 1 * 1 ) )} ]
   = broadcast( T2_l[ bS17{( 1 * 1 )} ] )
T4_g[ iS14{( i4 * ( 1 * 1 ) )} ]
   = T3_l[ bS16{( 1 * ( 1 * 1 ) )} ]
   + T1_g[ iS20{( i4 * ( 1 * 1 ) )} ];

TransformPrinter : 
T0_g[ bS18{( 1 * 1 )} ]
 root domain : (bS0{1},bS1{1})
  Merge: bS0{1} and bS1{1} -> bS18{( 1 * 1 )}
T2_l[ bS17{( 1 * 1 )} ]
 root domain : (bS5{1},bS6{1})
  Merge: bS5{1} and bS6{1} -> bS17{( 1 * 1 )}
T3_l[ bS16{( 1 * ( 1 * 1 ) )} ]
 root domain : (bS7{1},bS8{1},bS9{1})
  Merge: bS8{1} and bS9{1} -> bS15{( 1 * 1 )}
  Merge: bS7{1} and bS15{( 1 * 1 )} -> bS16{( 1 * ( 1 * 1 ) )}
T1_g[ iS20{( i4 * ( 1 * 1 ) )} ]
 root domain : (iS2{i4},bS3{1},bS4{1})
  Merge: bS3{1} and bS4{1} -> bS19{( 1 * 1 )}
  Merge: iS2{i4} and bS19{( 1 * 1 )} -> iS20{( i4 * ( 1 * 1 ) )}
T4_g[ iS14{( i4 * ( 1 * 1 ) )} ]
 root domain : (iS10{i4},bS11{1},bS12{1})
  Merge: bS11{1} and bS12{1} -> bS13{( 1 * 1 )}
  Merge: iS10{i4} and bS13{( 1 * 1 )} -> iS14{( i4 * ( 1 * 1 ) )}
}

Permissive map:
  {iS21{T1.size[0]}*; bS7{1}; iS22{T1.size[0]} }
  {bS11{1}*; bS8{1}; bS5{1}; bS0{1}; bS3{1} }
  {bS12{1}*; bS9{1}; bS6{1}; bS1{1}; bS4{1} }
  {bS13{( 1 * 1 )}*; bS15{( 1 * 1 )}; bS19{( 1 * 1 )} }
  {iS14{( T1.size[0] * ( 1 * 1 ) )}*; bS16{( 1 * ( 1 * 1 ) )}; bS17{( 1 * 1 )}; bS18{( 1 * 1 )}; iS20{( T1.size[0] * ( 1 * 1 ) )} }
Exact map:
  {bS7{1}* }
  {bS16{( 1 * ( 1 * 1 ) )}* }
  {iS21{T1.size[0]}*; iS22{T1.size[0]} }
  {bS11{1}*; bS8{1}; bS5{1}; bS0{1}; bS3{1} }
  {bS12{1}*; bS9{1}; bS6{1}; bS1{1}; bS4{1} }
  {bS13{( 1 * 1 )}*; bS15{( 1 * 1 )}; bS17{( 1 * 1 )}; bS18{( 1 * 1 )}; bS19{( 1 * 1 )} }
  {iS14{( T1.size[0] * ( 1 * 1 ) )}*; iS20{( T1.size[0] * ( 1 * 1 ) )} }

bS17 is exact mapped to bS13 but is not permissive mapped to bS13, which was the part that confused the indexing pass.

Semantically, depending on how we want to define permissive mapping, it might make sense to say all of bS13, iS10, and iS14 permissively mapped to bS17, but we might lose precision and consistency quite quickly once we start to map iterdomains on the same tensordomain to each other. This would be the formalization part that'd need some follow up work.

[TODO] another example to come as well.

Versions

NVFuser TOT

Metadata

Metadata

Assignees

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions