-
-
Notifications
You must be signed in to change notification settings - Fork 29
Loading model weights more efficiently #119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
/milestone v0.1.0 |
/kind feature |
/assign |
We may implement a simplified p2p network for efficient model distributing. See https://github.com/InftyAI/Manta |
How transformer handles large models: https://huggingface.co/docs/transformers/big_models |
/assign |
/milestone v0.2.0 |
Generally, we have several approaches here:
|
Let's focus on the approach 1 first, milestone v0.2.0 specifically. |
What would you like to be added:
Right now we can download model weights from model hub directly, but each time we start/restart a pod, it will downloading the model weights again. Without the loading accelerators like fluid or dragonfly, we should think of a way to tackle this more efficiently, let's focus on three things:
Why is this needed:
Completion requirements:
This enhancement requires the following artifacts:
The artifacts should be linked in subsequent comments.
The text was updated successfully, but these errors were encountered: