Skip to content

HQQ Tracker  #255

Closed
Closed
@HDCharles

Description

@HDCharles
  • A16W4 axis=1

    • Low hanging fruit we can add to int4wo quant as either a flag or replace the quant method
      • test eval with HQQ axis=1 and compare to existing version
    • if axis = 1 doesn't get enough accuracy improvement, we could also combine with equalization
      • test perf/eval with HQQ axis=1 + equalization
  • A16W4+ axis=1

    • Can quantize certain columns of W to 4/8 bit
      • may be faster to do a 4 bit matmul on all of W and a sparse 8 bit matmul?
      • test perf for int4wo + int8 matmul for n columns
    • HQQ+ end result is an int4wo matmul + lora matmul
      • back of envelope numbers look like 1/3 slowdown over int4 which is still better than int8
      • test perf for int4wo + lora
  • A8W4 axis=1

    • test eval accuracy with HQQ axis=1 and compare to existing version
  • A16W3 and A16W5

    • existing numbers depend on axis = 0, how do these numbers look with axis = 1
      • also relevant whether these numbers scale to llama3 since some quantization difficulty has been reported there
    • get numbers for 3/5 bit quantization with axis = 1, ideally for llama 3

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions