Loading model weights more efficiently #119

kerthcet · 2024-09-02T07:33:17Z

What would you like to be added:

Right now we can download model weights from model hub directly, but each time we start/restart a pod, it will downloading the model weights again. Without the loading accelerators like fluid or dragonfly, we should think of a way to tackle this more efficiently, let's focus on three things:

download the models the first time should be as quick as possible
don't need to download the model weights again when pod restarted
handle the model cache efficiently

Why is this needed:

Completion requirements:

This enhancement requires the following artifacts:

Design doc
API change
Docs update

The artifacts should be linked in subsequent comments.

kerthcet · 2024-09-02T07:33:43Z

/milestone v0.1.0

kerthcet · 2024-09-02T07:34:03Z

/kind feature

kerthcet · 2024-09-14T04:55:20Z

/assign

kerthcet · 2024-09-18T07:04:45Z

We may implement a simplified p2p network for efficient model distributing. See https://github.com/InftyAI/Manta

kerthcet · 2024-09-20T05:54:39Z

How transformer handles large models: https://huggingface.co/docs/transformers/big_models

kerthcet · 2024-10-08T07:50:55Z

/assign

kerthcet · 2024-12-24T14:28:24Z

/milestone v0.2.0
as Manta needs more developing time.

kerthcet · 2025-04-18T09:11:26Z

Generally, we have several approaches here:

without cache: leverage GPU stream like Support runai model streamer for fast model loading #352 to accelerate model loading.
with filesystem cache, we'll use P2P technologies like manta for in-cluster model loading, and Support runai model streamer for fast model loading #352 can still help us here as reading tensors from disk to GPU memory directly, however, we need to find out whether this is inference engine agnostic. Enterprise support: read tensors from peers to GPU and sync the model weights as well, which will benefit the pod restart, no longer need to read tensors from remote again. Generally, this will benefit for the future fine-tune and training system if we want to extend the scope in the future.
with OCI system cache, like integration with dragonfly and model spec implementation, see https://github.com/CloudNativeAI/modctl/blob/main/docs/getting-started.md

kerthcet · 2025-04-18T09:11:55Z

Let's focus on the approach 1 first, milestone v0.2.0 specifically.

InftyAI-Agent added needs-triage Indicates an issue or PR lacks a label and requires one. needs-kind Indicates a PR lacks a label and requires one. needs-priority Indicates a PR lacks a label and requires one. labels Sep 2, 2024

InftyAI-Agent added this to the v0.1.0 milestone Sep 2, 2024

InftyAI-Agent added feature Categorizes issue or PR as related to a new feature. and removed needs-kind Indicates a PR lacks a label and requires one. labels Sep 2, 2024

InftyAI-Agent assigned kerthcet and InftyAI-Agent and unassigned InftyAI-Agent Sep 14, 2024

InftyAI deleted a comment from InftyAI-Agent Sep 14, 2024

kerthcet removed their assignment Oct 8, 2024

InftyAI-Agent assigned kerthcet Oct 8, 2024

kerthcet mentioned this issue Oct 8, 2024

gh can't assign to somebody not a member of the org cli/cli#9620

Open

kerthcet mentioned this issue Nov 14, 2024

Accelerate model loading #103

Closed

3 tasks

InftyAI-Agent modified the milestones: v0.1.0, v0.2.0 Dec 24, 2024

kerthcet mentioned this issue Apr 18, 2025

Support runai model streamer for fast model loading #352

Open

3 tasks

kerthcet modified the milestones: v0.2.0, v0.3.0 Apr 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Loading model weights more efficiently #119

Loading model weights more efficiently #119

kerthcet commented Sep 2, 2024

kerthcet commented Sep 2, 2024

Uh oh!

kerthcet commented Sep 2, 2024

Uh oh!

kerthcet commented Sep 14, 2024

Uh oh!

kerthcet commented Sep 18, 2024 •

edited

Loading

Uh oh!

kerthcet commented Sep 20, 2024

Uh oh!

kerthcet commented Oct 8, 2024

Uh oh!

kerthcet commented Dec 24, 2024

Uh oh!

kerthcet commented Apr 18, 2025 •

edited

Loading

Uh oh!

kerthcet commented Apr 18, 2025

Uh oh!

Uh oh!

Loading model weights more efficiently #119

Loading model weights more efficiently #119

Comments

kerthcet commented Sep 2, 2024

kerthcet commented Sep 2, 2024

Uh oh!

kerthcet commented Sep 2, 2024

Uh oh!

kerthcet commented Sep 14, 2024

Uh oh!

kerthcet commented Sep 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kerthcet commented Sep 20, 2024

Uh oh!

kerthcet commented Oct 8, 2024

Uh oh!

kerthcet commented Dec 24, 2024

Uh oh!

kerthcet commented Apr 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kerthcet commented Apr 18, 2025

Uh oh!

kerthcet commented Sep 18, 2024 •

edited

Loading

kerthcet commented Apr 18, 2025 •

edited

Loading