Fail to reproduce benchmark results

Hi! I try to reproduce the benchmark [results](https://github.com/pytorch/ao/tree/main/torchao/quantization#benchmarks) using torchao/_models/llama/generate.py. However, I can not benchmark the quantized model successfully. Specifically, when using a torch version < 2.5.0, I got the following error:
```
  File "/mnt/workspace/Lumina-mGPT/torchao_benchmark.py", line 310, in main
    unwrap_tensor_subclass(model)
  File "/mnt/workspace/anaconda3/envs/lumina_mgpt/lib/python3.10/site-packages/torchao/utils.py", line 287, in unwrap_tensor_subclass
    unwrap_tensor_subclass(child)
  File "/mnt/workspace/anaconda3/envs/lumina_mgpt/lib/python3.10/site-packages/torchao/utils.py", line 287, in unwrap_tensor_subclass
    unwrap_tensor_subclass(child)
  File "/mnt/workspace/anaconda3/envs/lumina_mgpt/lib/python3.10/site-packages/torchao/utils.py", line 287, in unwrap_tensor_subclass
    unwrap_tensor_subclass(child)
  File "/mnt/workspace/anaconda3/envs/lumina_mgpt/lib/python3.10/site-packages/torchao/utils.py", line 286, in unwrap_tensor_subclass
    parametrize.register_parametrization(child, "weight", UnwrapTensorSubclass())
  File "/mnt/workspace/anaconda3/envs/lumina_mgpt/lib/python3.10/site-packages/torch/nn/utils/parametrize.py", line 562, in register_parametrization
    parametrizations = ParametrizationList([parametrization], original, unsafe=unsafe)
  File "/mnt/workspace/anaconda3/envs/lumina_mgpt/lib/python3.10/site-packages/torch/nn/utils/parametrize.py", line 173, in __init__
    originali = Parameter(originali)
  File "/mnt/workspace/anaconda3/envs/lumina_mgpt/lib/python3.10/site-packages/torch/nn/parameter.py", line 40, in __new__
    return torch.Tensor._make_subclass(cls, data, requires_grad)
RuntimeError: Only Tensors of floating point and complex dtype can require gradients
```
When upgrading the torch version to 2.5.0, the process got stucked and not responding for a very long time:
```
Using device=cuda
Loading model ...
Time to load model: 54.85 seconds
Compiling Model
^C^C^C^C^C^C
```
I do not see any CPU usage with top command, and I have to kill the process by its id.

Also, it there any way to accelerate a huggingface model by quantizing it with torchao, without converting the model format?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fail to reproduce benchmark results #1135

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Fail to reproduce benchmark results #1135

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions