Replies: 1 comment 1 reply
-
SO150M was covered in the paper, I modified it to run faster on GPU as both SO150M and S400M are poor shapes for GPU kernels. I didn't really have much luck working with the exponents in the paper. For the |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
How were the shape parameters derived for the shape-optimal models in So150M? Were the scaling exponents from the Getting ViT in Shape paper used? The analysis for this would be helpful, I am trying to derive other shape-optimized models.
Beta Was this translation helpful? Give feedback.
All reactions