HQQ Tracker 

- A16W4 axis=1
    -  Low hanging fruit we can add to int4wo quant as either a flag or replace the quant method
        - [x] test eval with HQQ axis=1 and compare to existing version
    - if axis = 1 doesn't get enough accuracy improvement, we could also combine with equalization
        - [x] test perf/eval with HQQ axis=1 + equalization 

- A16W4+ axis=1
    - Can quantize certain columns of W to 4/8 bit
        - may be faster to do a 4 bit matmul on all of W and a sparse 8 bit matmul?
        - [ ] test perf for int4wo + int8 matmul for n columns
    - HQQ+ end result is an int4wo matmul + lora matmul
        - back of envelope numbers look like 1/3 slowdown over int4 which is still better than int8
        - [ ] test perf for int4wo + lora 

- A8W4 axis=1 
    - [ ] test eval accuracy with HQQ axis=1 and compare to existing version

- A16W3 and A16W5
    - [existing numbers](https://mobiusml.github.io/hqq_blog/) depend on axis = 0, how do these numbers look with axis = 1
        - also relevant whether these numbers scale to llama3 since some quantization difficulty has been reported there 
    - [ ] get numbers for 3/5 bit quantization with axis = 1, ideally for llama 3


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

HQQ Tracker #255

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

HQQ Tracker #255

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions