Adding torchao apis to gpt-fast #208

HDCharles · 2024-10-17T07:49:14Z

Summary:

adding torchao apis to gpt-fast and some minor tweaks

Test Plan:

(in progress)
export MODEL_REPO=meta-llama/Meta-Llama-3-8B

python quantize.py --checkpoint_path checkpoints/$MODEL_REPO/model.pth --mode torchao-int8 python generate.py --checkpoint_path checkpoints/$MODEL_REPO/model_torchao-int8.pth --compile python generate.py --checkpoint_path checkpoints/$MODEL_REPO/model_torchao-int8.pth python eval.py --checkpoint_path checkpoints/$MODEL_REPO/model_torchao-int8.pth --tasks wikitext --compile

For model checkpoints/meta-llama/Meta-Llama-3-8B/model_torchao-int8.pth
wikitext: {'word_perplexity,none': 7.900496793735154, 'word_perplexity_stderr,none': 'N/A', 'byte_perplexity,none': 1.4718578218273202, 'byte_perplexity_stderr,none': 'N/A', 'bits_per_byte,none': 0.5576383170121927, 'bits_per_byte_stderr,none': 'N/A', 'alias': 'wikitext'}

python quantize.py --checkpoint_path checkpoints/$MODEL_REPO/model.pth --mode torchao-int4-hqq python generate.py --checkpoint_path checkpoints/$MODEL_REPO/model_torchao-int4-hqq.pth --compile python generate.py --checkpoint_path checkpoints/$MODEL_REPO/model_torchao-int4-hqq.pth python eval.py --checkpoint_path checkpoints/$MODEL_REPO/model_torchao-int4-hqq.pth --tasks wikitext --compile

For model checkpoints/meta-llama/Meta-Llama-3-8B/model_torchao-int4-hqq.pth
wikitext: {'word_perplexity,none': 8.44187872159186, 'word_perplexity_stderr,none': 'N/A', 'byte_perplexity,none': 1.4902143610748824, 'byte_perplexity_stderr,none': 'N/A', 'bits_per_byte,none': 0.575519871235033, 'bits_per_byte_stderr,none': 'N/A', 'alias': 'wikitext'}

python quantize.py --checkpoint_path checkpoints/$MODEL_REPO/model.pth --mode torchao-int4 python generate.py --checkpoint_path checkpoints/$MODEL_REPO/model_torchao-int4.pth --compile python generate.py --checkpoint_path checkpoints/$MODEL_REPO/model_torchao-int4.pth python eval.py --checkpoint_path checkpoints/$MODEL_REPO/model_torchao-int4.pth --tasks wikitext --compile

For model checkpoints/meta-llama/Meta-Llama-3-8B/model_torchao-int4.pth
wikitext: {'word_perplexity,none': 8.59031159441983, 'word_perplexity_stderr,none': 'N/A', 'byte_perplexity,none': 1.4950796712267396, 'byte_perplexity_stderr,none': 'N/A', 'bits_per_byte,none': 0.5802223661766339, 'bits_per_byte_stderr,none': 'N/A', 'alias': 'wikitext'}

Reviewers:

Subscribers:

Tasks:

Tags:

Summary: adding torchao apis to gpt-fast and some minor tweaks Test Plan: (in progress) export MODEL_REPO=meta-llama/Meta-Llama-3-8B python quantize.py --checkpoint_path checkpoints/$MODEL_REPO/model.pth --mode torchao-int8 python generate.py --checkpoint_path checkpoints/$MODEL_REPO/model_torchao-int8.pth --compile python generate.py --checkpoint_path checkpoints/$MODEL_REPO/model_torchao-int8.pth python eval.py --checkpoint_path checkpoints/$MODEL_REPO/model_torchao-int8.pth --tasks wikitext --compile python quantize.py --checkpoint_path checkpoints/$MODEL_REPO/model.pth --mode torchao-int4-hqq python generate.py --checkpoint_path checkpoints/$MODEL_REPO/model_torchao-int4-hqq.pth --compile python generate.py --checkpoint_path checkpoints/$MODEL_REPO/model_torchao-int4-hqq.pth python eval.py --checkpoint_path checkpoints/$MODEL_REPO/model_torchao-int4-hqq.pth --tasks wikitext --compile python quantize.py --checkpoint_path checkpoints/$MODEL_REPO/model.pth --mode torchao-int4 python generate.py --checkpoint_path checkpoints/$MODEL_REPO/model_torchao-int4.pth --compile python generate.py --checkpoint_path checkpoints/$MODEL_REPO/model_torchao-int4.pth python eval.py --checkpoint_path checkpoints/$MODEL_REPO/model_torchao-int4.pth --tasks wikitext --compile python quantize.py --checkpoint_path checkpoints/$MODEL_REPO/model.pth --mode int8 python generate.py --checkpoint_path checkpoints/$MODEL_REPO/model_int8.pth --compile python generate.py --checkpoint_path checkpoints/$MODEL_REPO/model_int8.pth python eval.py --checkpoint_path checkpoints/$MODEL_REPO/model_int8.pth --tasks wikitext --compile python quantize.py --checkpoint_path checkpoints/$MODEL_REPO/model.pth --mode int4 python generate.py --checkpoint_path checkpoints/$MODEL_REPO/model_int4.g32.pth --compile python generate.py --checkpoint_path checkpoints/$MODEL_REPO/model_int4.g32.pth python eval.py --checkpoint_path checkpoints/$MODEL_REPO/model_int4.g32.pth --tasks wikitext --compile Reviewers: Subscribers: Tasks: Tags:

HDCharles · 2024-10-17T14:01:12Z

@Chillee should i add info to the README or implement this differently?

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

gchlebus · 2024-11-08T14:31:45Z

Is the plan for this PR also to add fp8 support which is available in torchao?

jerryzh168 · 2024-12-20T04:29:14Z

it seems that this one does not work with tp yet:

ENABLE_INTRA_NODE_COMM=1 torchrun --standalone --nproc_per_node=2 generate.py --compile --checkpoint_path checkpoints/$MODEL_REPO/model_torchao-int4.pth

[rank1]: NotImplementedError: AffineQuantizedTensor dispatch: attempting to run unimplemented operator/function: func=<OpOverload(op='aten.split_with_sizes', overload='default')>, types=(<class 'torchao.dtypes.affine_quantized_tensor.AffineQuantizedTensor'>,), arg_types=(<class 'torchao.dtypes.affine_quantized_tensor.AffineQuantizedTensor'>, <class 'list'>), kwarg_types={}

will need to implement this op in AQT to support this or change the tp implementation to DTensor I guess.

meta-cla · 2025-08-09T00:37:44Z

Hi @HDCharles!

Thank you for your pull request.

We require contributors to sign our Contributor License Agreement, and yours needs attention.

You currently have a record in our system, but the CLA is no longer valid, and will need to be resubmitted.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 17, 2024

HDCharles requested a review from Chillee October 17, 2024 14:00

Adding info to readme

7144ffb

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

jerryzh168 mentioned this pull request Dec 20, 2024

int4 quant broken right now? #217

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adding torchao apis to gpt-fast #208

Adding torchao apis to gpt-fast #208

Uh oh!

HDCharles commented Oct 17, 2024 •

edited

Loading

Uh oh!

HDCharles commented Oct 17, 2024

Uh oh!

gchlebus commented Nov 8, 2024

Uh oh!

jerryzh168 commented Dec 20, 2024

Uh oh!

meta-cla bot commented Aug 9, 2025

Uh oh!

Uh oh!

Adding torchao apis to gpt-fast #208

Are you sure you want to change the base?

Adding torchao apis to gpt-fast #208

Uh oh!

Conversation

HDCharles commented Oct 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HDCharles commented Oct 17, 2024

Uh oh!

gchlebus commented Nov 8, 2024

Uh oh!

jerryzh168 commented Dec 20, 2024

Uh oh!

meta-cla bot commented Aug 9, 2025

Process

Uh oh!

Uh oh!

HDCharles commented Oct 17, 2024 •

edited

Loading