Skip to content

[FEATURE] Add dynamic suppor for AutoRound quantiztion #329

@Qubitium

Description

@Qubitium

@wenhuach21 GPTQModel has merged dynamic per layer/module control of quantization but I don't think auto-round currently supports such per layer/module control during quantization. I know this is something AutoRound also wants. Is there anyway we can work together to standardize the data-interface to transfer the dynamic info to auto-round? Since this feature is new, I am open to changing the protocol within gptqmodel itself if autoround has better suggestions. Thanks.

https://github.com/ModelCloud/GPTQModel/blob/main/tests/test_dynamic.py

Ref: dynamic inference port to vllm (will port to sglang after vllm merge) vllm-project/vllm#7086

Both quantizer (GPTQModel and AutoRound) and inference library (vllm, sglang) need to receive the per layer/module dynamic overrides. It would be nice if everyone can somehow agree to something close/or similar to avoid compat issues.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions