-
Notifications
You must be signed in to change notification settings - Fork 113
Description
@wenhuach21 GPTQModel has merged dynamic
per layer/module control of quantization but I don't think auto-round currently supports such per layer/module control during quantization. I know this is something AutoRound also wants. Is there anyway we can work together to standardize the data-interface to transfer the dynamic
info to auto-round? Since this feature is new, I am open to changing the protocol within gptqmodel itself if autoround has better suggestions. Thanks.
https://github.com/ModelCloud/GPTQModel/blob/main/tests/test_dynamic.py
Ref: dynamic
inference port to vllm (will port to sglang after vllm merge) vllm-project/vllm#7086
Both quantizer (GPTQModel and AutoRound) and inference library (vllm, sglang) need to receive the per layer/module dynamic
overrides. It would be nice if everyone can somehow agree to something close/or similar to avoid compat issues.