modify cast from hp to mx to help inductor fuse #1786
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary:
Thanks to investigation from @eellison, moving the reshape to the end of the cast helps inductor fuse the cast into a single kernel. This doesn't yet work with fp4, but let's unblock fp8 and deal with fp4 later.
Fixes #1769
Note: in the repro with swizzling from
#1773, we go from 3 to 2 kernels. Further investigation is needed whether we can fuse the swizzling.
Test Plan:
Reviewers:
Subscribers:
Tasks:
Tags: