Skip to content

Fix #449: Delay construction of Python attributes #847

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

mdboom
Copy link
Contributor

@mdboom mdboom commented Aug 18, 2025

Description

For the StridedMemoryView, this delays construction of the Python types shape, strides and dtype and converts the other simpler objects to regular Cython cdefs.

This has an approximately 18% speedup on StridedMemoryView object creation:

  • Baseline: Mean +- std dev: 993 ns +- 41 ns
  • Deferred Python object creation: Mean +- std dev: 821 ns +- 32 ns

This also adds tests to cover the __cuda_array_interface__ code path.

closes

Checklist

  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

Copy link
Contributor

copy-pr-bot bot commented Aug 18, 2025

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@mdboom
Copy link
Contributor Author

mdboom commented Aug 18, 2025

/ok to test

@leofang leofang added this to the cuda.core beta 7 milestone Aug 18, 2025
@mdboom
Copy link
Contributor Author

mdboom commented Aug 18, 2025

/ok to test

@@ -51,7 +50,7 @@ cdef class StridedMemoryView:
Pointer to the tensor buffer (as a Python `int`).
shape : tuple
Shape of the tensor.
strides : tuple
strides : Optional[tuple]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just updating docs to match current behavior (not a change in behavior)

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR optimizes the StridedMemoryView class by delaying the construction of Python attributes (shape, strides, and dtype) until they are actually accessed, resulting in an 18% performance improvement in object creation. The changes convert attributes to Cython cdefs and implement lazy property access.

Key changes:

  • Convert StridedMemoryView attributes to lazy properties with delayed construction
  • Add Cython cdef declarations for better performance
  • Include comprehensive test coverage for the __cuda_array_interface__ code path

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
cuda_core/cuda/core/experimental/_memoryview.pyx Implements lazy property construction and Cython optimization for StridedMemoryView
cuda_core/tests/test_utils.py Adds test coverage for the CUDA Array Interface code path

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link

@mdboom
Copy link
Contributor Author

mdboom commented Aug 18, 2025

/ok to test

@leofang leofang added enhancement Any code-related improvements P1 Medium priority - Should do cuda.core Everything related to the cuda.core module P0 High priority - Must do! and removed P1 Medium priority - Should do labels Aug 18, 2025
kkraus14
kkraus14 previously approved these changes Aug 19, 2025
@github-project-automation github-project-automation bot moved this from Todo to In Review in CCCL Aug 19, 2025
Copy link
Member

@leofang leofang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my apology Mike, I was reviewing but got distracted yesterday, let me dump my WIP comments and I'll try to wrap up asap

@github-project-automation github-project-automation bot moved this from In Review to In Progress in CCCL Aug 19, 2025
@mdboom
Copy link
Contributor Author

mdboom commented Aug 19, 2025

/ok to test

@mdboom mdboom requested a review from leofang August 19, 2025 16:02
Copy link
Member

@leofang leofang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks Mike!


# Memoized properties
cdef tuple _shape
cdef object _strides
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC self._strides = None is allowed in Cython even if _strides is typed to tuple, so this might help readability (and perf too?)

Suggested change
cdef object _strides
cdef tuple _strides

@github-project-automation github-project-automation bot moved this from In Progress to In Review in CCCL Aug 19, 2025
@property
def strides(self) -> Optional[tuple[int]]:
cdef int itemsize
if self._strides is None and self.exporting_obj is not None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see a minor potential concern here. Unlike _shape where we can use None to indicate it's uninitialized, _strides cannot do this because None is a valid value (indicating C-contiguity) which was meant to be a fast path. Now with this check, we turn the fast path into a slow path because we keep re-running the checks (and resetting _strides to None) every time this property is requested. A dumb fix would be to add another boolean member to indicate if _strides is initialized. Thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuda.core Everything related to the cuda.core module enhancement Any code-related improvements P0 High priority - Must do!
Projects
Status: In Review
Development

Successfully merging this pull request may close these issues.

3 participants