-
Notifications
You must be signed in to change notification settings - Fork 699
Migrate CPU Backend to allocate memory at runtime. #1943
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for doing this work @gcatron . I am excited to see the fast development on this PR. I made a few comments in the code, mainly regarding comments and style.
I asked a few questions, such as "is it possible to forward declare a class". if the answer is "no" then feel free to ignore the question and move on.
Please make sure to run the ./run.sh tests with the --iterations=10 and -cpu flags and check that there are no noticeable performance regressions. I expect none but we need to be sure.
Also, please make sure to squash the PR down to one or two commits, or whatever is reasonable.
@opti-mix is the expert in this part of the project, so if he approved the design then feel free to land (after fixing the style issues).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not very familiar with this code path :)
Question:
Do you allocate memory for weights and activations every time on inference request?
I think longer term we could keep activations in thread local memory, facilitate thread safety and avoidance of allocation/deallocation overhead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you test this change :)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable as a first step.
I believe with want to be more aggressive on the refactoring going forward.
Have you verified that AOT still works :)? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also have information about JIT under /docs in the repo, could you check if anything needs to be updated there?
Looking good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job! Looking good!
44c1a35
to
e320fe6
Compare
Created new RuntimeBundle to pass information forward from compile time to runtime
c5a126e
to
d32e0e8
Compare
Congrats! Nice work. |
Description: Migrate CPU Backend to allocate and initialize memory at runtime. Added a new runtimeInfo object that contains information needed for runtime memory allocation.
Testing: ninja test passes, AOT bundles still work, testing with run.sh --iterations=20 -cpu -time showed minimal performance difference.
Documentation: N/A
Progress on #1904