-
Notifications
You must be signed in to change notification settings - Fork 158
Correct usage of cuda.core._memory.Buffer
?
#557
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I should check that the address reported by compute sanitizer is near the integer pointer address that I get from the Buffer object. |
@carterbox I think the simplest example would look something like:
Based on your error above my best guess would be that something is either wrong with the |
I checking that the streams used at allocation and use-time were the same and noticed that the stream pointers I was getting from CuPy/Torch were not the same as the ones from cuda.core for the same stream. This lead me to realize that I was doing something wrong when converting cuda.core objects from python objects into addresses of the underlying C objects. For example: raw_workspace_ptr: int = buffer.handle.getPtr() This is incorrect! Because it returns the pointer to the python cuda.bindings object not the address of actual memory buffer. Instead we should do this: raw_workspace_ptr: int = int(buffer.handle) Which I guess is pythonic, but also not obvious or documented in the documentation of Buffer or Stream. |
This is documented here: https://nvidia.github.io/cuda-python/cuda-bindings/latest/tips_and_tricks.html#getting-the-address-of-underlying-c-objects-from-the-low-level-bindings But I agree this isn't the most clear and is prone to exactly the situation you ran into. |
I'm thinking I want to contribute a documentation fix which either:
Number 2 is probably the better approach? |
@carterbox this was discussed in an offline meeting and it was generally agreed that we aren't happy with the current state of things with regards to I'm going to write up a new issue and close this one that captures the discussion and some next steps. |
We discussed further offline, and to move away from |
I am trying to allocate workspace for cublaslt using cuda.core. First, I allocate a memory Buffer like so:
Then later I pass this pointer to cublaslt via the nvmath-python bindings like so:
The problem is that when I use this Buffer abstraction from cuda.core, I get errors from CUDA runtime. For example, when running with compute-sanitizer:
It seems to be reporting that the buffer is an invalid memory address. When I use the allocators provided by CuPy or pytorch, there are no errors.
Looking for opinions on:
The text was updated successfully, but these errors were encountered: