-
Notifications
You must be signed in to change notification settings - Fork 7
Matmul benchmarking: case without tile quantization: #1980
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
97c6ee9
to
d0c3fc9
Compare
da4b1b7
to
4d8daea
Compare
@@ -20,6 +20,7 @@ if(USE_CUDA) | |||
softmax_backward.cpp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to myself: I have split this file out and merged separately in #2007
This file is no longer needed here anymore.
benchmarks/cpp/nvfuser/matmul.cpp
Outdated
@@ -0,0 +1,356 @@ | |||
#include <torch/csrc/jit/codegen/cuda/arith.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to myself: I have split this file out and merged separately in #2007
This file is no longer needed here anymore.
After rebasing, this PR is just a trivial PR adding a test, I will merge this now to the bottom of the stack |
… options (#1978) * pipe through cpasyncCG * Matmul benchmarking: case without tile quantization: (#1980) * add matmul benchmark * more benchmark and test extension * fixes Co-authored-by: Xiang Gao <[email protected]> * fix --------- Co-authored-by: Xiang Gao <[email protected]>
This is the benchmarking PR in this series, tracking the resulting performance from this stack of PRs.
Most recent run on A100: