-
Notifications
You must be signed in to change notification settings - Fork 503
[PyTorch] Refactor C++ quantizer infrastructure #1952
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
ksivaman
merged 32 commits into
NVIDIA:main
from
timmoon10:refactor-quantizer-create-tensor-func
Jul 29, 2025
Merged
Changes from all commits
Commits
Show all changes
32 commits
Select commit
Hold shift + click to select a range
c6efd90
remove reciprocal op
zhongbozhu 2fdef53
Refactor Quantizer::create_tensor function
timmoon10 bd5e1dd
Merge branch 'main' into refactor-quantizer-create-tensor-func
timmoon10 1338edf
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 6d30bb9
Fix bug when constructing FP8 tensor
timmoon10 5fca0a0
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] dc6fae5
Add quantize function to C++ quantizers
timmoon10 7ac091d
Prototype function to coerce Python quantized tensors to match quantizer
timmoon10 b30a4b4
Use quantizer class in tex.quantize
timmoon10 23be7be
Add FP8 current scaling support for activation backward
timmoon10 302a77d
Disable quantized GEMM output with FP8 current scaling
timmoon10 952333a
Add coerce_tensor functions for MXFP8 and DSv3
timmoon10 86af34c
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] d0479a9
Merge branch 'main' into refactor-quantizer-create-tensor-func
timmoon10 596ead5
Avoid quantizing empty tensors
timmoon10 c4270b3
Use consistent shapes for FP8 transposes
timmoon10 34d1fde
In attention impl, construct FP8 tensors with pre-initialized scale-invs
timmoon10 a49cb5e
Initialize MXFP8 scales to zero
timmoon10 ba68676
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 0a79048
Merge branch 'main' into refactor-quantizer-create-tensor-func
timmoon10 76d2d53
Store copy of quantizer when creating quantized tensors
timmoon10 c54d821
Fix linter warnings
timmoon10 c5d0e46
Merge branch 'main' into refactor-quantizer-create-tensor-func
timmoon10 c252dc0
Make sure quantized tensors have private quantizer
timmoon10 c3c1df3
Merge branch 'main' into refactor-quantizer-create-tensor-func
timmoon10 df6313c
Rename "coerce_tensor" to "convert_and_update_tensor"
timmoon10 27cf92a
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 3e7dbb1
Make sure CUDA context is available when launching NVRTC kernel
timmoon10 6bdbb12
Merge branch 'main' into refactor-quantizer-create-tensor-func
timmoon10 261f60f
Expose CUDA context creation function externally
timmoon10 970e54d
Merge branch 'main' into refactor-quantizer-create-tensor-func
timmoon10 2cd7fb2
Merge branch 'main' into refactor-quantizer-create-tensor-func
ksivaman File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR exposed a bug in our NVRTC infrastructure. Three facts:
By removing unnecessary
at::reciprocal
s fromcreate_tensor
, I experienced some cases where the backward pass launched an NVRTC kernel before launching any PyTorch ops (namely in the FP8 linear op with UB). Since the autograd thread's context stack was empty, this resulted in "invalid device context" errors.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is interesting, thanks!