Skip to content

Conversation

liangel-02
Copy link

@liangel-02 liangel-02 commented Sep 5, 2025

Context

Currently, we need to use safe_serialization=False while saving models as shown here. This PR enables safetensors support for torchao so that users can now save and load checkpoints using safetensors. Currently, only Float8Tensor is supported (Float8DynamicActivationFloat8WeightConfig, Float8WeightOnlyConfig) but allowing other subclasses should involve minimal code changes.

# default forsafe_serialization is True
quantized_model.push_to_hub(save_to)

Summary

Changes to transformers code includes:

  1. In TorchAoHfQuantizer, we provide get_state_dict and update_state_dict_with_metadata that flattens/unflattens a model state dict with tensor subclasses by calling functionality built out in this PR.
  2. In modeling_utils.py, we make appropriate changes to support propagating the metadata from tensor subclasses. We also add logic similar to hqq and bnb to directly load onto cpu rather than meta.

Test Plan

Modified unit test to allow safe serialization. Run using python tests/quantization/torchao_integration/test_torchao.py

Reference https://huggingface.co/torchao-testing/opt-125m-Float8WeightOnlyConfig-v2-0.14.0.dev-safetensors for an example of a serialized model and test script

@liangel-02 liangel-02 force-pushed the torchao_safetensors branch 2 times, most recently from d60acfe to 392a504 Compare September 5, 2025 21:20
@liangel-02 liangel-02 marked this pull request as draft September 5, 2025 21:40
@Rocketknight1
Copy link
Member

cc @MekkCyber for quantization

@liangel-02 liangel-02 force-pushed the torchao_safetensors branch 7 times, most recently from 7246421 to 6a26d01 Compare September 8, 2025 20:39
Copy link
Contributor

github-actions bot commented Sep 8, 2025

[For maintainers] Suggested jobs to run (before merge)

run-slow: torchao_integration

@@ -399,7 +401,7 @@ def test_autoquant(self):

check_autoquantized(self, quantized_model.model.layers[0].self_attn.v_proj)

EXPECTED_OUTPUT = "What are we having for dinner?\n\nJane: (sighs)"
EXPECTED_OUTPUT = "What are we having for dinner?\n\nJessica: (smiling)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be reverted?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i double checked that this fails on main as well, so i just added the correction

@jerryzh168
Copy link
Contributor

looks good, please update PR summary to align with recent code changes as well

cc @SunMarc @MekkCyber please check if the API changes make sense

@sayakpaul
Copy link
Member

Would be very nice to have this propagated to diffusers as well :)

Copy link
Member

@SunMarc SunMarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this nice PR ! This is a very nice feature that will bring more adoption for torchao ! Excited to have this soon in diffusers also. Left a couple of comments

Comment on lines +4023 to +4027
if hf_quantizer.quantization_config.quant_method is QuantizationMethod.TORCHAO:
state_dict, metadata = hf_quantizer.get_state_dict_and_metadata(self, safe_serialization)
else:
state_dict = hf_quantizer.get_state_dict(self)
metadata["format"] = "pt"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<aybe we remove get_state_dict completely and only keep get_state_dict_and_metadata. I think this will be more clearer.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renaming in a separate PR

@liangel-02 liangel-02 marked this pull request as ready for review September 9, 2025 13:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants