Skip to content

[PyTorch] Multi-tensor swizzle scaling factors for MXFP8 and fuse padding zeros #2019

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Aug 6, 2025

Conversation

yaox12
Copy link
Member

@yaox12 yaox12 commented Aug 1, 2025

Description

  1. Implement multi-tensor swizzle scaling factor kernels.
  2. Fuse padding scaling factors with zeros into the swizzling kernels.

From Nsys profile with 8 experts, each mnk=(4096, 4096, 4096), we're seeing 1.4x speedup on host side and 1.7x speedup on the device side.

  • Main branch
image
  • This PR
image

Type of change

  • Documentation change (change only to the documentation, either a fix or a new content)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Infra/Build change
  • Code refactoring

Changes

Please list the changes introduced in this PR:

  • Change A
  • Change B

Checklist:

  • I have read and followed the contributing guidelines
  • The functionality is complete
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@yaox12
Copy link
Member Author

yaox12 commented Aug 1, 2025

/te-ci

@yaox12
Copy link
Member Author

yaox12 commented Aug 1, 2025

/te-ci

@phu0ngng
Copy link
Collaborator

phu0ngng commented Aug 4, 2025

My review is mainly focusing on the changes to TE/Common. LGTM!

@yaox12
Copy link
Member Author

yaox12 commented Aug 6, 2025

/te-ci

@timmoon10 timmoon10 merged commit c0d2f1a into NVIDIA:main Aug 6, 2025
41 checks passed
nv-akorzh pushed a commit to nv-akorzh/TransformerEngine that referenced this pull request Aug 6, 2025
…ding zeros (NVIDIA#2019)

* for loop

Signed-off-by: Xin Yao <[email protected]>

* bulk alloc

Signed-off-by: Xin Yao <[email protected]>

* multi-tensor swizzle

Signed-off-by: Xin Yao <[email protected]>

* pad zeros in swizzle kernels

Signed-off-by: Xin Yao <[email protected]>

* unify single- and multi-tensor swizzle

Signed-off-by: Xin Yao <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix empty tensor list

Signed-off-by: Xin Yao <[email protected]>

* fix bug for col swizzle

Signed-off-by: Xin Yao <[email protected]>

* check context & fix signifiers

Signed-off-by: Xin Yao <[email protected]>

---------

Signed-off-by: Xin Yao <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Anton Korzh <[email protected]>
nv-akorzh pushed a commit to nv-akorzh/TransformerEngine that referenced this pull request Aug 7, 2025
…ding zeros (NVIDIA#2019)

* for loop

Signed-off-by: Xin Yao <[email protected]>

* bulk alloc

Signed-off-by: Xin Yao <[email protected]>

* multi-tensor swizzle

Signed-off-by: Xin Yao <[email protected]>

* pad zeros in swizzle kernels

Signed-off-by: Xin Yao <[email protected]>

* unify single- and multi-tensor swizzle

Signed-off-by: Xin Yao <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix empty tensor list

Signed-off-by: Xin Yao <[email protected]>

* fix bug for col swizzle

Signed-off-by: Xin Yao <[email protected]>

* check context & fix signifiers

Signed-off-by: Xin Yao <[email protected]>

---------

Signed-off-by: Xin Yao <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Anton Korzh <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants