Skip to content

Fail to reproduce benchmark results #1135

Open
@ThisisBillhe

Description

@ThisisBillhe

Hi! I try to reproduce the benchmark results using torchao/_models/llama/generate.py. However, I can not benchmark the quantized model successfully. Specifically, when using a torch version < 2.5.0, I got the following error:

  File "/mnt/workspace/Lumina-mGPT/torchao_benchmark.py", line 310, in main
    unwrap_tensor_subclass(model)
  File "/mnt/workspace/anaconda3/envs/lumina_mgpt/lib/python3.10/site-packages/torchao/utils.py", line 287, in unwrap_tensor_subclass
    unwrap_tensor_subclass(child)
  File "/mnt/workspace/anaconda3/envs/lumina_mgpt/lib/python3.10/site-packages/torchao/utils.py", line 287, in unwrap_tensor_subclass
    unwrap_tensor_subclass(child)
  File "/mnt/workspace/anaconda3/envs/lumina_mgpt/lib/python3.10/site-packages/torchao/utils.py", line 287, in unwrap_tensor_subclass
    unwrap_tensor_subclass(child)
  File "/mnt/workspace/anaconda3/envs/lumina_mgpt/lib/python3.10/site-packages/torchao/utils.py", line 286, in unwrap_tensor_subclass
    parametrize.register_parametrization(child, "weight", UnwrapTensorSubclass())
  File "/mnt/workspace/anaconda3/envs/lumina_mgpt/lib/python3.10/site-packages/torch/nn/utils/parametrize.py", line 562, in register_parametrization
    parametrizations = ParametrizationList([parametrization], original, unsafe=unsafe)
  File "/mnt/workspace/anaconda3/envs/lumina_mgpt/lib/python3.10/site-packages/torch/nn/utils/parametrize.py", line 173, in __init__
    originali = Parameter(originali)
  File "/mnt/workspace/anaconda3/envs/lumina_mgpt/lib/python3.10/site-packages/torch/nn/parameter.py", line 40, in __new__
    return torch.Tensor._make_subclass(cls, data, requires_grad)
RuntimeError: Only Tensors of floating point and complex dtype can require gradients

When upgrading the torch version to 2.5.0, the process got stucked and not responding for a very long time:

Using device=cuda
Loading model ...
Time to load model: 54.85 seconds
Compiling Model
^C^C^C^C^C^C

I do not see any CPU usage with top command, and I have to kill the process by its id.

Also, it there any way to accelerate a huggingface model by quantizing it with torchao, without converting the model format?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions