[torchao safetensors] integrate torchao safetensors support with transformers #40735

liangel-02 · 2025-09-05T21:11:08Z

Context

Currently, we need to use safe_serialization=False while saving models as shown here. This PR enables safetensors support for torchao so that users can now save and load checkpoints using safetensors. Currently, only Float8Tensor is supported (Float8DynamicActivationFloat8WeightConfig, Float8WeightOnlyConfig) but allowing other subclasses should involve minimal code changes.

# default forsafe_serialization is True
quantized_model.push_to_hub(save_to)

Summary

Changes to transformers code includes:

In TorchAoHfQuantizer, we provide get_state_dict and update_state_dict_with_metadata that flattens/unflattens a model state dict with tensor subclasses by calling functionality built out in this PR.
In modeling_utils.py, we make appropriate changes to support propagating the metadata from tensor subclasses. We also add logic similar to hqq and bnb to directly load onto cpu rather than meta.

Test Plan

Modified unit test to allow safe serialization. Run using python tests/quantization/torchao_integration/test_torchao.py

Reference https://huggingface.co/torchao-testing/opt-125m-Float8WeightOnlyConfig-v2-0.14.0.dev-safetensors for an example of a serialized model and test script

src/transformers/modeling_utils.py

src/transformers/quantizers/quantizer_torchao.py

tests/quantization/torchao_integration/test_torchao.py

Rocketknight1 · 2025-09-08T11:04:35Z

cc @MekkCyber for quantization

github-actions · 2025-09-08T22:01:45Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: torchao_integration

jerryzh168 · 2025-09-08T22:54:59Z

tests/quantization/torchao_integration/test_torchao.py

@@ -399,7 +401,7 @@ def test_autoquant(self):

        check_autoquantized(self, quantized_model.model.layers[0].self_attn.v_proj)

-        EXPECTED_OUTPUT = "What are we having for dinner?\n\nJane: (sighs)"
+        EXPECTED_OUTPUT = "What are we having for dinner?\n\nJessica: (smiling)"


should this be reverted?

i double checked that this fails on main as well, so i just added the correction

tests/quantization/torchao_integration/test_torchao.py

jerryzh168 · 2025-09-08T22:56:33Z

looks good, please update PR summary to align with recent code changes as well

cc @SunMarc @MekkCyber please check if the API changes make sense

src/transformers/quantizers/quantizer_torchao.py

sayakpaul · 2025-09-09T09:34:39Z

Would be very nice to have this propagated to diffusers as well :)

SunMarc

Thanks for this nice PR ! This is a very nice feature that will bring more adoption for torchao ! Excited to have this soon in diffusers also. Left a couple of comments

src/transformers/quantizers/quantizer_torchao.py

tests/quantization/torchao_integration/test_torchao.py

src/transformers/modeling_utils.py

SunMarc · 2025-09-09T09:55:47Z

src/transformers/modeling_utils.py

+            if hf_quantizer.quantization_config.quant_method is QuantizationMethod.TORCHAO:
+                state_dict, metadata = hf_quantizer.get_state_dict_and_metadata(self, safe_serialization)
+            else:
+                state_dict = hf_quantizer.get_state_dict(self)
+        metadata["format"] = "pt"


<aybe we remove get_state_dict completely and only keep get_state_dict_and_metadata. I think this will be more clearer.

renaming in a separate PR

src/transformers/quantizers/quantizer_torchao.py

tests/quantization/torchao_integration/test_torchao.py

enable torchao safetensors

9643eaf

liangel-02 force-pushed the torchao_safetensors branch 2 times, most recently from d60acfe to 392a504 Compare September 5, 2025 21:20

liangel-02 marked this pull request as draft September 5, 2025 21:40