Handle module names from Dynamo compiler in FP8 Quantizer #2223

sandeep-maddipatla · 2025-06-10T19:20:44Z

Type of Change

Bug Fix, with No API change.

Description

The Measure component generates stats with original module names from the model, as opposed to the altered module names used by torch.compile to reference the original model layers in a compiled model.
We need the quantizer to reference the original module names to be able to use the measured stat dumps correctly. This PR captures the Quantizer change to do this

Expected Behavior & Potential Risk

Current behavior is that is the quantizer searches for the stats with altered module names from compiled model and fails with an exception such as below

Exception: Error - Layer '_orig_mod.x_embedder' was called but was not quantized because no measures were supplied.

With this PR, this error is no longer generated.

How has this PR been tested?

The test script below reproduces the problem. Re-run this with the PR in place to verify there is no error.

# Fetch sources, install dependencies
pip install optimum-habana sentencepiece
git clone https://github.com/huggingface/optimum-habana
cd /path/to/working-dir
cp -r /path/to/optimum-habana/examples/stable-diffusion/quantization .
huggingface-cli login --token YourHFTokenGoesHere

Have below test script in working directory.

import os
import torch
from optimum.habana.diffusers import  GaudiFlowMatchEulerDiscreteScheduler, GaudiFluxPipeline

mode = os.environ.get('MODE', 'quant')

# load model
model_name = "black-forest-labs/FLUX.1-dev"
scheduler = GaudiFlowMatchEulerDiscreteScheduler.from_pretrained(
    model_name,
    subfolder="scheduler"
)
pipe = GaudiFluxPipeline.from_pretrained(
    model_name,
    scheduler=scheduler,
    use_habana=True,
    use_hpu_graphs=False,
    gaudi_config="Habana/stable-diffusion",
    bf16_full_eval=True,
    torch_dtype=torch.bfloat16
)

if mode == 'measure':
    # dump measure stats through INC
    os.environ["QUANT_CONFIG"] = "quantization/flux/measure_config.json"
    pipe(
        prompt="A picture of sks dog in a bucket",
        quant_mode="measure",
    )
    print('Measurement step done')
elif mode == 'quant':
    # quantize with INC (from measured stats)
    os.environ["QUANT_CONFIG"] = "quantization/flux/quantize_config.json"
    pipe.transformer = torch.compile(pipe.transformer, backend="hpu_backend")
    image = pipe(
        prompt="A picture of sks dog in a bucket",
        quant_mode="quantize"
    ).images[0]
    image.save(f"output_image.png")
    print('Quant Step done')
else:
    print(f'Unrecognized setting for MODE={mode}')

Run the two-step quantization with below commands - run measure step first, and then quant.

MODE=measure PT_HPU_LAZY_MODE=0 python reproducer.py
MODE=quant PT_HPU_LAZY_MODE=0 python reproducer.py

Dependency Change?

No library / dependency changes

- Quantizer equivalent for how the measure component handles the same scenario

skaulintel · 2025-06-13T18:05:24Z

looks good to me.

xin3he · 2025-06-16T06:44:06Z

I'd like to get comments from Habana team, @ulivne and @linoybu , please take a look~

yiliu30

Does the current flow pass the compiled model to INC? I think we should compile the model after it is processed by INC, as INC assumes the input model is an eager model.

sandeep-maddipatla · 2025-06-24T18:36:56Z

Hi @yiliu30 .. Thank you for your response! Is this limitation of INC (expecting an eager model) documented? If so, can you pls point me to it?

I also see that the INC measure counterpart for this script handles the altered layer names from torch.compile - so I assumed this flow was supported.

yiliu30 · 2025-06-25T01:38:03Z

Hi @sandeep-maddipatla, We haven't documented it for now. But the INC FP8 quantization logic is to replace the Torch module with a patched one. For example, we replace torch.Linear with PatchedLinear. When creating the patched module, we copy some attributes from the original one. However, the compile process may modify some of them.

Have you encountered any issues with the recommended flow?

Handle module names from Dynamo compiler in FP8 Quantizer

9321e44

- Quantizer equivalent for how the measure component handles the same scenario

skaulintel approved these changes Jun 13, 2025

View reviewed changes

thuang6 requested review from xin3he and yiliu30 June 16, 2025 03:08

yiliu30 requested changes Jun 23, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Handle module names from Dynamo compiler in FP8 Quantizer #2223

Handle module names from Dynamo compiler in FP8 Quantizer #2223

Uh oh!

sandeep-maddipatla commented Jun 10, 2025

Uh oh!

skaulintel commented Jun 13, 2025

Uh oh!

xin3he commented Jun 16, 2025

Uh oh!

yiliu30 left a comment

Uh oh!

sandeep-maddipatla commented Jun 24, 2025

Uh oh!

yiliu30 commented Jun 25, 2025

Uh oh!

Uh oh!

Handle module names from Dynamo compiler in FP8 Quantizer #2223

Are you sure you want to change the base?

Handle module names from Dynamo compiler in FP8 Quantizer #2223

Uh oh!

Conversation

sandeep-maddipatla commented Jun 10, 2025

Type of Change

Description

Expected Behavior & Potential Risk

How has this PR been tested?

Dependency Change?

Uh oh!

skaulintel commented Jun 13, 2025

Uh oh!

xin3he commented Jun 16, 2025

Uh oh!

yiliu30 left a comment

Choose a reason for hiding this comment

Uh oh!

sandeep-maddipatla commented Jun 24, 2025

Uh oh!

yiliu30 commented Jun 25, 2025

Uh oh!

Uh oh!