-
Notifications
You must be signed in to change notification settings - Fork 276
[WIP] Apply SuperBlock to Llama #1047
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[WIP] Apply SuperBlock to Llama #1047
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1047
Note: Links to docs will display an error until the docs builds have been completed. This comment was automatically generated by Dr. CI and updates every 15 minutes. |
@@ -120,7 +120,7 @@ def mlp_only(mod, name): | |||
|
|||
|
|||
def superblock_only(mod, name): | |||
return isinstance(mod, SupermaskLinear) and "mlp" in name | |||
return isinstance(mod, SupermaskLinear)# and "mlp" in name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mostafaelhoushi Should this be changed to SupermaskReplacementClass
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm... SupermaskReplacementClass
constructor requires a lot of arguments like linear_sparsity
, and linear_sp_tilesize
, etc. How will we pass them here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I need to do some more refactoring:
- the ViT benchmark code assumes that there is a model checkpoint trained with SuperBlock, and hence has
SupermaskLinear
layers and parameters - the GPT-Fast code I wrote did a hack in which it converted
Linear
layers toSupermaskLinear
layers then applied BSR sparsification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I think I read your changes wrong, I was assuming you created a SupermaskReplacementClass to combine SupermaskLinear, SupermaskConv, but I still you're still using those under the hood. I think this should be fine actually.
c9738ac
to
252402b
Compare
5c59c94
to
7f7bd85
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be torchao/prototype/sparsity
? #1013
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is because this PR forked from an old commit that had superblock in torchao/sparsity/prototype
.
When finalizing the PR we can rebase on top of main and change the path of the directory.
Still work in progress.
To run: