-
Notifications
You must be signed in to change notification settings - Fork 317
Demonstrate adding a cuda build + cuda custom op to torchao #130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
] | ||
} | ||
if debug_mode: | ||
extra_compile_args["cxx"].append("-g") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will likely need to conda install cuda -c nvidia
in CI since I don't think nvcc will be available
setup.py
Outdated
this_dir = os.path.dirname(os.path.abspath(__file__)) | ||
extensions_dir = os.path.join(this_dir, "torchao", "csrc") | ||
sources = [ | ||
os.path.join(extensions_dir, p) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
more of a nit but should this be a recursive=True
so we can have kernels grouped by folders under the cuda/
folder
@@ -0,0 +1,181 @@ | |||
#include <ATen/ATen.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok i guess as a first useful kernel to merge, I'm gonna adapt this example to use paged attention
#include <torch/library.h> | ||
#include <torch/types.h> | ||
|
||
TORCH_LIBRARY_FRAGMENT(torchao, m) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you write this or was it codegened?
EDIT: My question is actually maybe can we codegen this? same as load_inline does
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not codegenned. We could codegen it, but in general codegen adds more complexity. What were you thinking for the API?
We copy-paste a CUDA kernel from torchvision for nms to serve as the example.
Thanks for sending this! I guess this needs some version guards because it requires features from the nightlies? Or is the CI failure real for 2.2?
|
This was upstreamed as part of #135 |
We copy-paste a CUDA kernel from torchvision for nms to serve as the example.