FP-Quant support #38696

BlackSamorez · 2025-06-09T12:31:24Z

This PR adds support for the FP-Quant method.

The goal of this PR is to integrate inference and training support for the FP-Quant method that utilizes the Hadamard Transform for efficient weights+activations quantization. When using it with MXFP4 and MSE-based scaling, it implements Quartet forward pass. We're also working on adding NVFP4 support and backward pass support.

Currently, we're working on the kernels here, and the integration here.

Installation:

Install qutlass: git clone https://github.com/IST-DASLab/qutlass.git && cd qutlass && pip install --no-build-isolation .
Install fp_quant: pip install fp_quant

Usage:

Use as JIT quantization from any BF16 model by passing quantization_config=FPQuantConfig()
Calibrate with GPTQ with the repo with --real_quant.
Use pre-quantized models from hub: coming soon...

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Rocketknight1 · 2025-06-09T14:41:19Z

cc @MekkCyber

MekkCyber

Hi @BlackSamorez ! Thanks a lot for this addition 🤗 ! Left a few comments !

MekkCyber · 2025-06-12T08:23:40Z

src/transformers/integrations/quartet_qat.py

@@ -0,0 +1,49 @@
+# Copyright 2024 The HuggingFace Team. All rights reserved.


Suggested change

# Copyright 2024 The HuggingFace Team. All rights reserved.

# Copyright 2025 The HuggingFace Team. All rights reserved.

MekkCyber · 2025-06-12T08:25:10Z

src/transformers/integrations/quartet_qat.py

+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"HIGGS through FLUTE (Flexible Lookup Table Engine for LUT-quantized LLMs) integration file"


Suggested change

"HIGGS through FLUTE (Flexible Lookup Table Engine for LUT-quantized LLMs) integration file"

"Quartet QAT integration file"

MekkCyber · 2025-06-12T08:26:31Z

src/transformers/integrations/quartet_qat.py

+if is_torch_available():
+    pass
+


we don't need this

MekkCyber · 2025-06-12T08:28:25Z

src/transformers/quantizers/quantizer_quartet_qat.py

@@ -0,0 +1,164 @@
+# Copyright 2024 The HuggingFace Inc. team. All rights reserved.


Suggested change

# Copyright 2024 The HuggingFace Inc. team. All rights reserved.

# Copyright 2025 The HuggingFace Inc. team. All rights reserved.

MekkCyber · 2025-06-12T08:58:23Z

src/transformers/quantizers/quantizer_quartet_qat.py

+    Quantizer of the HIGGS method. Enables the loading of prequantized models and in-flight quantization of full-precision models.
+    """
+


to be updated

MekkCyber · 2025-06-12T09:15:34Z

src/transformers/utils/import_utils.py

+def is_qutlass_available():
+    return _qutlass_available
+


I can't find a distribution for qutlass, is it not released yet ?

It has just been released: https://github.com/IST-DASLab/qutlass

MekkCyber · 2025-06-12T09:21:58Z

src/transformers/quantizers/quantizer_quartet_qat.py

+        for name, module in tqdm(quartet_qat_modules.items(), desc="Pre-processing Quartet QAT modules", leave=False):
+            pass
+            # module.pre_forward()
+


What’s meant to happen here exactly ?

MekkCyber · 2025-06-12T09:23:32Z

src/transformers/quantizers/quantizer_quartet_qat.py

+        if isinstance(module, QuartetLinear) and tensor_name == "weight":
+            # Only quantize weights of QuartetLinear modules that are not already quantized
+            return True
+        else:
+            return False


is the bias quantized too ?

MekkCyber · 2025-06-12T09:33:20Z

src/transformers/quantizers/quantizer_quartet_qat.py

+        assert isinstance(module, QuartetLinear), f"Module {param_name} is not a QuartetLinear somehow..."
+


no need for assert here, or we can just raise an error instead

MekkCyber · 2025-06-12T09:34:04Z

src/transformers/quantizers/quantizer_quartet_qat.py

+        module.pre_forward()
+


what's happening here ?

Hadamard transform matrix initialization on the correct devices.

Since it's a QAT method, we might or might not want to keep a full-precision weight copy. If we don't need the full precision weight copy, this function also deletes the .weight parameter after quantizing it. Here's the code.

kooshi · 2025-06-30T21:40:16Z

Hi @BlackSamorez, I'm really looking forward to experimenting with this.

When can we expect to have the kernels public so we can begin testing, even if they are still WIP?

BlackSamorez · 2025-07-14T16:39:54Z

@MekkCyber Hi, thanks for reviewing this!
It took us a while, but all the kernels necessary for inference have been published: I've updated the PR description.
May I ask you to do another pass? Your previous comments mostly don't apply anymore because of refactoring.

SunMarc

Thanks for iterating on the PR and congrats on the release ! The only major thing missing before we merge this is some documentation for this new method ! Please ping me when it's done and I'll merge the PR !

SunMarc · 2025-07-15T09:54:31Z

src/transformers/utils/quantization_config.py

+        store_master_weights (`bool`, *optional*, defaults to `False`):
+            Whether to store the master weights.


in which context storing master weights could be useful ?

In the context of QAT, which we'll add in a later release: we're still working on the quantized backward pass kernels. But I thought it would make sense to include this option right away to not have to edit the config later.

Added this to docstring

SunMarc · 2025-07-15T09:55:07Z

src/transformers/utils/quantization_config.py

+        forward_method (`str`, *optional*, defaults to `"abs_max"`):
+            The method to use for the forward pass.


we have absmax and quest for this arg. can you explain a bit what quest does ?

Added docstring explanation

SunMarc · 2025-07-15T09:55:51Z

src/transformers/utils/quantization_config.py

+        hadamard_group_size (`int`, *optional*, defaults to 32):
+            The group size for the hadamard transform.


explain a bit what this does

Improved docstring

SunMarc · 2025-07-15T09:56:07Z

tests/quantization/fp_quant_integration/test_fp_quant.py

+if is_torch_available():
+    pass
+
+if is_accelerate_available():
+    pass


SunMarc · 2025-07-15T09:56:49Z

src/transformers/quantizers/quantizer_fp_quant.py

+            return
+
+        module.weight = torch.nn.Parameter(param_value.to(target_device))
+        module.pre_forward()


really nice to put all the quantization logic there

HuggingFaceDocBuilderDev · 2025-07-15T10:15:51Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

BlackSamorez · 2025-07-18T14:15:45Z

@SunMarc added docs, improved docstring, cleaned the code where you asked.

BlackSamorez · 2025-07-22T12:55:57Z

Should be good

SunMarc

LGTM ! Thanks for iterating !

SunMarc · 2025-07-22T15:14:59Z

One last nit, the build PR documentation is not passing:

    raise RuntimeError(
RuntimeError: The following files are not present in the table of contents:
- quantization/fp_quant
Add them to ../transformers/docs/source/en/_toctree.yml.

BlackSamorez · 2025-07-22T15:23:22Z

Added it to toctree

BlackSamorez · 2025-07-23T08:54:40Z

@SunMarc it hit job cancellation somehow. Might need a restart. It should be good.

github-actions · 2025-07-23T09:06:22Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: fp_quant_integration

SunMarc · 2025-07-23T09:41:20Z

Merged ! Thanks for your work

SunMarc · 2025-07-24T15:21:40Z

Hey @BlackSamorez, is there a way to make fp_quant compatible with py3.9 ? Our CI runs on this version but fp_quant requires 3.11

BlackSamorez · 2025-07-24T15:23:19Z

I guess I'll have to remove match-case constructions and it'll work.
Why run on 3.9 in 2025 though?

SunMarc · 2025-07-24T15:28:52Z

We want to make sure that the min version of python that is maintained runs transformers correctly. When it will reach EOL, we switch to the next version

* quartet * quartet qat -> quartet * format * bf16 backward * interfaces * forward_method * quartet -> fp_quant * style * List -> list * list typing * fixed format and annotations * test_fp_quant * docstrings and default dtypes * better docstring and removed noop checks * docs * pseudoquantization support to test on non-blackwell * pseudoquant * Pseudoquant docs * Update docs/source/en/quantization/fp_quant.md Co-authored-by: Marc Sun <[email protected]> * Update docs/source/en/quantization/fp_quant.md * Update docs/source/en/quantization/fp_quant.md * Update src/transformers/utils/quantization_config.py Co-authored-by: Mohamed Mekkouri <[email protected]> * Update tests/quantization/fp_quant_integration/test_fp_quant.py Co-authored-by: Mohamed Mekkouri <[email protected]> * Update tests/quantization/fp_quant_integration/test_fp_quant.py Co-authored-by: Marc Sun <[email protected]> * small test fixes * dockerfile update * spec link * removed `_process_model_after_weight_loading` * toctree --------- Co-authored-by: Marc Sun <[email protected]> Co-authored-by: Mohamed Mekkouri <[email protected]>

quartet

62ea54b

BlackSamorez mentioned this pull request Jun 9, 2025

Release Quartet pre-trained model checkpoints on Hugging Face IST-DASLab/Quartet#1

Closed

MekkCyber reviewed Jun 12, 2025

View reviewed changes

SunMarc self-requested a review June 12, 2025 15:31

BlackSamorez added 5 commits June 23, 2025 15:25

quartet qat -> quartet

dce76de

format

de7cdd8

bf16 backward

b5e2cfb

interfaces

9632a75

forward_method

7199b60

BlackSamorez added 3 commits July 1, 2025 10:29

quartet -> fp_quant

c2b5b29

style

55cbb2a

Merge branch 'huggingface:main' into main

d340904

kooshi mentioned this pull request Jul 1, 2025

FP4 Training NVIDIA/TransformerEngine#1701

Open

List -> list

72a6e4a

BlackSamorez changed the title ~~[WIP] Quartet QAT support~~ [WIP] FP-Quant support Jul 13, 2025

BlackSamorez mentioned this pull request Jul 14, 2025

About RTX5090 kernels IST-DASLab/Quartet#5

Open

BlackSamorez added 5 commits July 14, 2025 14:42

list typing

9d6c3d7

Merge branch 'main' of github.com:BlackSamorez/transformers

7c38139

fixed format and annotations

53f1985

test_fp_quant

b7d7d3f

docstrings and default dtypes

c0b7bfc

SunMarc reviewed Jul 15, 2025

View reviewed changes

Merge branch 'main' into main

294a465

BlackSamorez added 2 commits July 18, 2025 12:03

better docstring and removed noop checks

4dbbf10

docs

f666a49

BlackSamorez added 5 commits July 22, 2025 14:33

small test fixes

76810b8

dockerfile update

e8f1192

Merge branch 'main' of github.com:BlackSamorez/transformers

1d31865

spec link

b92ab57

removed _process_model_after_weight_loading

22ba0b4

Merge branch 'main' into main

ff03597

SunMarc approved these changes Jul 22, 2025

View reviewed changes

SunMarc enabled auto-merge (squash) July 22, 2025 15:01

BlackSamorez added 2 commits July 22, 2025 17:22

toctree

235fc60

Merge branch 'main' of github.com:BlackSamorez/transformers

5640282

auto-merge was automatically disabled July 22, 2025 15:23
Head branch was pushed to by a user without write access

Merge branch 'main' into main

5f87d90

SunMarc merged commit 623ab01 into huggingface:main Jul 23, 2025
25 checks passed

BlackSamorez mentioned this pull request Aug 3, 2025

FP-Quant NVFP4 and Python 3.9 support #39876

Merged

5 tasks

		@@ -0,0 +1,49 @@
		# Copyright 2024 The HuggingFace Team. All rights reserved.

	# Copyright 2024 The HuggingFace Team. All rights reserved.
	# Copyright 2025 The HuggingFace Team. All rights reserved.

	"HIGGS through FLUTE (Flexible Lookup Table Engine for LUT-quantized LLMs) integration file"
	"Quartet QAT integration file"

		@@ -0,0 +1,164 @@
		# Copyright 2024 The HuggingFace Inc. team. All rights reserved.

	# Copyright 2024 The HuggingFace Inc. team. All rights reserved.
	# Copyright 2025 The HuggingFace Inc. team. All rights reserved.

		Quantizer of the HIGGS method. Enables the loading of prequantized models and in-flight quantization of full-precision models.
		"""

		assert isinstance(module, QuartetLinear), f"Module {param_name} is not a QuartetLinear somehow..."

		store_master_weights (`bool`, optional, defaults to `False`):
		Whether to store the master weights.

		forward_method (`str`, optional, defaults to `"abs_max"`):
		The method to use for the forward pass.

		hadamard_group_size (`int`, optional, defaults to 32):
		The group size for the hadamard transform.

FP-Quant support #38696

FP-Quant support #38696

Uh oh!

Conversation

BlackSamorez commented Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

This PR adds support for the FP-Quant method.

Who can review?

Uh oh!

Rocketknight1 commented Jun 9, 2025

Uh oh!

MekkCyber left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kooshi commented Jun 30, 2025

Uh oh!

BlackSamorez commented Jul 14, 2025

Uh oh!

SunMarc left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Jul 15, 2025

Uh oh!

BlackSamorez commented Jul 18, 2025

Uh oh!

BlackSamorez commented Jul 22, 2025

Uh oh!

SunMarc left a comment

Choose a reason for hiding this comment

Uh oh!

SunMarc commented Jul 22, 2025

Uh oh!

BlackSamorez commented Jul 22, 2025

Uh oh!

BlackSamorez commented Jul 23, 2025

BlackSamorez commented Jun 9, 2025 •

edited

Loading

SunMarc left a comment •

edited

Loading