Allow `weights_only=True` load for gemlite layout #2081

jerryzh168 · 2025-04-19T03:35:44Z

Summary:
This PR adds a few imports from gemlite so that the gemlite checkpoint can be loaded with weights_only = True in huggingface (which is the default)

torch.load(gemlite_checkpoint, weights_only=True

Note: we need to remove getattr in the future

Test Plan:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, TorchAoConfig

model_id = "jerryzh168/phi4-mini-int4wo-gemlite"

quantized_model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", torch_dtype="auto")
tokenizer = AutoTokenizer.from_pretrained(model_id)

prompt = "Hey, are you conscious? Can you talk to me?"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
generated_ids = quantized_model.generate(**inputs, max_new_tokens=128)
output_text = tokenizer.batch_decode(
    generated_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text)

Reviewers:

Subscribers:

Tasks:

Tags:

Summary: This PR adds a few imports from gemlite so that the gemlite checkpoint can be loaded with weights_only = True in huggingface (which is the default) `torch.load(gemlite_checkpoint, weights_only=True` Note: we need to remove `getattr` in the future Test Plan: ``` import torch from transformers import AutoModelForCausalLM, AutoTokenizer, TorchAoConfig model_id = "jerryzh168/phi4-mini-int4wo-gemlite" quantized_model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", torch_dtype="auto") tokenizer = AutoTokenizer.from_pretrained(model_id) prompt = "Hey, are you conscious? Can you talk to me?" inputs = tokenizer(prompt, return_tensors="pt").to("cuda") generated_ids = quantized_model.generate(**inputs, max_new_tokens=128) output_text = tokenizer.batch_decode( generated_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False ) print(output_text) ``` Reviewers: Subscribers: Tasks: Tags:

pytorch-bot · 2025-04-19T03:35:47Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2081

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

[Infra] Jobs got intermittently cancelled/fail midway checkout

✅ You can merge normally! (1 Unrelated Failure)

As of commit e91f88d with merge base 34421b1 ():

FLAKY - The following job failed but was likely due to flakiness present on trunk:

Run Regression Tests / test-nightly (CUDA Nightly, linux.g5.12xlarge.nvidia.gpu, --pre torch --index-url https://downloa... / linux-job (gh) (matched linux rule in flaky-rules.json)
The runner has received a shutdown signal. This can happen when the runner service is stopped, or a manually started runner is canceled.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

mikaylagawarecki · 2025-04-22T19:32:03Z

torchao/dtypes/uintx/gemlite_layout.py

+        from gemlite.core import DType, GemLiteLinearTriton
+
+        # TODO: we need to remove `getattr` since it's unsafe (by picklescan)
+        torch.serialization.add_safe_globals([DType, GemLiteLinearTriton, getattr])


Since hf flags this, would it make sense to wait until we see whether gemlite can remove the need for getattr on this one

We can remove this part, I fixed this issue in #2096

jerryzh168 · 2025-04-24T02:33:56Z

no longer needed since the serialization issue is fixed in #2096

facebook-github-bot added the CLA Signed label Apr 19, 2025

jerryzh168 requested review from mobicham, HDCharles and mikaylagawarecki April 19, 2025 03:36

jerryzh168 added the topic: improvement label Apr 19, 2025

mikaylagawarecki reviewed Apr 22, 2025

View reviewed changes

jerryzh168 closed this Apr 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Allow `weights_only=True` load for gemlite layout #2081

Allow `weights_only=True` load for gemlite layout #2081

jerryzh168 commented Apr 19, 2025

Uh oh!

pytorch-bot bot commented Apr 19, 2025 •

edited

Loading

Uh oh!

mikaylagawarecki Apr 22, 2025

Uh oh!

mobicham Apr 22, 2025 •

edited

Loading

Uh oh!

jerryzh168 commented Apr 24, 2025

Uh oh!

Allow weights_only=True load for gemlite layout #2081

Allow weights_only=True load for gemlite layout #2081

Conversation

jerryzh168 commented Apr 19, 2025

Uh oh!

Uh oh!

pytorch-bot bot commented Apr 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2081

❗ 1 Active SEVs

✅ You can merge normally! (1 Unrelated Failure)

Uh oh!

mikaylagawarecki Apr 22, 2025

Choose a reason for hiding this comment

Uh oh!

mobicham Apr 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jerryzh168 commented Apr 24, 2025

Uh oh!

Allow `weights_only=True` load for gemlite layout #2081

Allow `weights_only=True` load for gemlite layout #2081

pytorch-bot bot commented Apr 19, 2025 •

edited

Loading

mobicham Apr 22, 2025 •

edited

Loading