Skip to content

HQQ Tracker #255

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 of 6 tasks
HDCharles opened this issue May 17, 2024 · 1 comment
Closed
2 of 6 tasks

HQQ Tracker #255

HDCharles opened this issue May 17, 2024 · 1 comment
Assignees

Comments

@HDCharles
Copy link
Contributor

HDCharles commented May 17, 2024

  • A16W4 axis=1

    • Low hanging fruit we can add to int4wo quant as either a flag or replace the quant method
      • test eval with HQQ axis=1 and compare to existing version
    • if axis = 1 doesn't get enough accuracy improvement, we could also combine with equalization
      • test perf/eval with HQQ axis=1 + equalization
  • A16W4+ axis=1

    • Can quantize certain columns of W to 4/8 bit
      • may be faster to do a 4 bit matmul on all of W and a sparse 8 bit matmul?
      • test perf for int4wo + int8 matmul for n columns
    • HQQ+ end result is an int4wo matmul + lora matmul
      • back of envelope numbers look like 1/3 slowdown over int4 which is still better than int8
      • test perf for int4wo + lora
  • A8W4 axis=1

    • test eval accuracy with HQQ axis=1 and compare to existing version
  • A16W3 and A16W5

    • existing numbers depend on axis = 0, how do these numbers look with axis = 1
      • also relevant whether these numbers scale to llama3 since some quantization difficulty has been reported there
    • get numbers for 3/5 bit quantization with axis = 1, ideally for llama 3
@mobicham
Copy link
Collaborator

@HDCharles Thanks! what do you mean by equalization ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants