Skip to content

[WIP] Apply SuperBlock to Llama #1047

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

mostafaelhoushi
Copy link

@mostafaelhoushi mostafaelhoushi commented Oct 10, 2024

Still work in progress.
To run:

cd torchao/_models/llama

python generate.py --checkpoint_path ${CHECKPOINT_PATH}/model.pth --superblock

Copy link

pytorch-bot bot commented Oct 10, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1047

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 10, 2024
@@ -120,7 +120,7 @@ def mlp_only(mod, name):


def superblock_only(mod, name):
return isinstance(mod, SupermaskLinear) and "mlp" in name
return isinstance(mod, SupermaskLinear)# and "mlp" in name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mostafaelhoushi Should this be changed to SupermaskReplacementClass?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm... SupermaskReplacementClass constructor requires a lot of arguments like linear_sparsity, and linear_sp_tilesize, etc. How will we pass them here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I need to do some more refactoring:

  • the ViT benchmark code assumes that there is a model checkpoint trained with SuperBlock, and hence has SupermaskLinear layers and parameters
  • the GPT-Fast code I wrote did a hack in which it converted Linear layers to SupermaskLinear layers then applied BSR sparsification.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I think I read your changes wrong, I was assuming you created a SupermaskReplacementClass to combine SupermaskLinear, SupermaskConv, but I still you're still using those under the hood. I think this should be fine actually.

Copy link
Contributor

@jerryzh168 jerryzh168 Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be torchao/prototype/sparsity ? #1013

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is because this PR forked from an old commit that had superblock in torchao/sparsity/prototype.
When finalizing the PR we can rebase on top of main and change the path of the directory.

@mostafaelhoushi mostafaelhoushi changed the title [WIP] Apply SuperBlock to GPT-Fast [WIP] Apply SuperBlock to Llama Feb 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants