-
Notifications
You must be signed in to change notification settings - Fork 699
Moved OpenCL memory allocation to runtime. #1958
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I suspect it has something to do with copying weights per inference onto the device. Do you track anywhere migration plan for changing this eventually to copying weights once per inference? |
vgg19 inference times were very similar between master and current branch. The migration is called out as a TODO in execution engine which points back to #1904 . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good now! Please fix the small nits from my last comments and you're good to go.
@gcatron And do not forget to squash the commits if needed. |
Created new BackendUtils library and put collectConstants there.. Added helper function to retrieve symbol offset by value from symbolTable
682a785
to
7b7faf9
Compare
Description: Move OpenCL memory allocation to runtime. This is in support of #1904.
Testing: Ran ninja test all pass. run.sh --iterations=10 -opencl -time. Accuracy is the same as master now. Time is almost identical for all except resnet 50.
Master:
Wall time per iteration (s): 0.1335
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
1.1925 (100.0%) 0.1420 (100.0%) 1.3345 (100.0%) 1.3350 (100.0%) Infer
1.1925 (100.0%) 0.1420 (100.0%) 1.3345 (100.0%) 1.3350 (100.0%) Total
Wall time per iteration (s): 0.4013
Current Branch:
Wall time per iteration (s): 0.1628
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
0.8187 (100.0%) 0.1483 (100.0%) 0.9670 (100.0%) 1.6282 (100.0%) Infer
0.8187 (100.0%) 0.1483 (100.0%) 0.9670 (100.0%) 1.6282 (100.0%) Total
Wall time per iteration (s): 0.7370
Documentation: N/A
Progress on #1904