Skip to content

modify cast from hp to mx to help inductor fuse #1786

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 26, 2025

Conversation

vkuzo
Copy link
Contributor

@vkuzo vkuzo commented Feb 26, 2025

Summary:

Thanks to investigation from @eellison, moving the reshape to the end of the cast helps inductor fuse the cast into a single kernel. This doesn't yet work with fp4, but let's unblock fp8 and deal with fp4 later.

Fixes #1769

Note: in the repro with swizzling from
#1773, we go from 3 to 2 kernels. Further investigation is needed whether we can fuse the swizzling.

Test Plan:

pytest test/prototype/mx_formats/test_mx_tensor.py -x -s -k test_to_mx_inductor_single_kernel

Reviewers:

Subscribers:

Tasks:

Tags:

Copy link

pytorch-bot bot commented Feb 26, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1786

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 3 Pending

As of commit 584efe0 with merge base d00ee41 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 26, 2025
@vkuzo vkuzo requested a review from eellison February 26, 2025 21:12
@vkuzo vkuzo added the topic: performance Use this tag if this PR improves the performance of a feature label Feb 26, 2025
Summary:

Thanks to investigation from @eellison, moving the reshape
to the end of the cast helps inductor fuse the cast into a single
kernel.  This doesn't yet work with fp4, but let's unblock fp8 and deal
with fp4 later.

Fixes #1690

Note: in the repro with swizzling from
#1773, we go from 3 to 2 kernels.
Further investigation is needed whether we can fuse the swizzling.

Test Plan:

```
pytest test/prototype/mx_formats/test_mx_tensor.py -x -s -k test_to_mx_inductor_single_kernel
```

Reviewers:

Subscribers:

Tasks:

Tags:
@vkuzo vkuzo force-pushed the 20250226_mx_single_cast_kernel branch from d233496 to 584efe0 Compare February 26, 2025 21:13
Copy link
Contributor

@eellison eellison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good as a workaround ! I'm still planning on making the code fuse as it was previously.. but worth landing in interim

@vkuzo vkuzo merged commit 8d110bf into main Feb 26, 2025
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. topic: performance Use this tag if this PR improves the performance of a feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

torch.compile cast to mxfp8 should only require one kernel
3 participants