-
Notifications
You must be signed in to change notification settings - Fork 13.3k
opencl: improve profiling #12442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opencl: improve profiling #12442
Conversation
lhez
commented
Mar 18, 2025
- Wait for profiling events and collect profiling data when model execution is done. This way, the displayed performance numbers are more close to the true performance.
- Generate a chrome trace in addition to csv.
* Populate profiling timing info at the end rather than after each kernel run
sorry to bother you, how can I mark a specified PR as ready for review? thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, and reminds me that I wanted to integrate graph-profiler branch with opencl and cpu backends.
If you start a Draft PR there is a way to mark it ready. qnn-backend PR is not marked as draft.
I've been keep an eye on it. In general, I'd say QNN is not the right solution here but I'll take another look. |
@max-krasnyansky, thanks so much for your valuable guidance/correction on direction. I think I know something about the third tech approach of "utilize the Hexagon NPU maximally", in other words, the Hexagon DSP SDK should be used in the third tech approach, which is exactly similar to what your excellent engineering team did with ggml-opencl, or which is exactly similar to what I did with video decoding hardware acceleration many years ago(it's also a DSP chip). my guess might-be not correct, accordingly, it's greatly appreciated that you can give me/the llama.cpp community a clear explanation or a roughly confirmation of the third tech approach. |