Skip to content

Support peer access DPC++ extension #2077

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 23 commits into
base: master
Choose a base branch
from

Conversation

ndgrigorian
Copy link
Collaborator

This PR adds full support for the DPC++ sycl_ext_oneapi_peer_access extension

This adds a Python interface to calls that check if a dpctl.SyclDevice can enable peer access to another dpctl.SyclDevice and, if so, to enable or disable it.

  • Have you provided a meaningful PR description?
  • Have you added a test, reproducer or referred to an issue with a reproducer?
  • Have you tested your changes locally for CPU and GPU devices?
  • Have you made sure that new changes do not introduce compiler warnings?
  • Have you checked performance impact of proposed changes?
  • Have you added documentation for your changes, if necessary?
  • Have you added your changes to the changelog?
  • If this PR is a work in progress, are you opening the PR as a draft?

@ndgrigorian ndgrigorian force-pushed the feature/support-peer-access-extension branch from da9f596 to 20e80f1 Compare May 5, 2025 21:33
Copy link

github-actions bot commented May 5, 2025

Copy link

github-actions bot commented May 5, 2025

Array API standard conformance tests for dpctl=0.20.0dev0=py310h93fe807_194 ran successfully.
Passed: 1109
Failed: 3
Skipped: 119

@coveralls
Copy link
Collaborator

coveralls commented May 5, 2025

Coverage Status

coverage: 84.974% (-1.4%) from 86.372%
when pulling 031cff1 on feature/support-peer-access-extension
into 3aff6ca on master.

Copy link

github-actions bot commented May 6, 2025

Array API standard conformance tests for dpctl=0.20.0dev0=py310h93fe807_194 ran successfully.
Passed: 1109
Failed: 3
Skipped: 119

@ndgrigorian ndgrigorian force-pushed the feature/support-peer-access-extension branch from 20e80f1 to a21d585 Compare May 6, 2025 23:47
Copy link

github-actions bot commented May 7, 2025

Array API standard conformance tests for dpctl=0.20.0dev0=py310h93fe807_199 ran successfully.
Passed: 1107
Failed: 5
Skipped: 119

Copy link

github-actions bot commented May 7, 2025

Array API standard conformance tests for dpctl=0.20.0dev0=py310h93fe807_200 ran successfully.
Passed: 1109
Failed: 3
Skipped: 119

Copy link

github-actions bot commented May 7, 2025

Array API standard conformance tests for dpctl=0.20.0dev0=py310h93fe807_201 ran successfully.
Passed: 1108
Failed: 4
Skipped: 119

@ndgrigorian
Copy link
Collaborator Author

ndgrigorian commented May 9, 2025

Noting here that for future reference, we will probably need to document how this extension interacts with sub-devices and composite devices.

For example, per composite device documentation

Interaction with the default context

If the implementation supports the sycl_ext_oneapi_default_context extension, then each platform has the notion of a default context.
This default context contains only the platform’s root devices, which are those returned by platform::get_devices.
Any composite devices for the platform are not considered root devices, and thus the default context does not include these composite devices.
If an application wants a context that contains the composite devices, then it must explicitly create such a context.

this means that while the peer access methods will work between composite device and its component devices, and vice versa, to actually leverage it, the user needs to explicitly construct a context containing component and composite devices.

also make peer access invalid when the device and peer device are the same
Copy link

Array API standard conformance tests for dpctl=0.20.0dev0=py310h93fe807_203 ran successfully.
Passed: 1105
Failed: 7
Skipped: 119

@ndgrigorian ndgrigorian force-pushed the feature/support-peer-access-extension branch from 04c424b to 5a92522 Compare May 12, 2025 22:12
@ndgrigorian ndgrigorian requested a review from antonwolfy May 12, 2025 22:13
@ndgrigorian ndgrigorian marked this pull request as ready for review May 12, 2025 22:13
Copy link

Array API standard conformance tests for dpctl=0.20.0dev0=py310h93fe807_205 ran successfully.
Passed: 1113
Failed: 6
Skipped: 119

1 similar comment
Copy link

Array API standard conformance tests for dpctl=0.20.0dev0=py310h93fe807_205 ran successfully.
Passed: 1113
Failed: 6
Skipped: 119

Copy link

Array API standard conformance tests for dpctl=0.20.0dev0=py310h93fe807_206 ran successfully.
Passed: 1113
Failed: 6
Skipped: 119

@ndgrigorian ndgrigorian force-pushed the feature/support-peer-access-extension branch from c8a6345 to b3ead6f Compare May 12, 2025 23:17
Copy link

Array API standard conformance tests for dpctl=0.20.0dev0=py310h93fe807_206 ran successfully.
Passed: 1114
Failed: 5
Skipped: 119

@ndgrigorian ndgrigorian force-pushed the feature/support-peer-access-extension branch from b3ead6f to 3e81777 Compare May 13, 2025 00:11
Copy link

Array API standard conformance tests for dpctl=0.20.0dev0=py310h93fe807_206 ran successfully.
Passed: 1113
Failed: 6
Skipped: 119

@ndgrigorian ndgrigorian force-pushed the feature/support-peer-access-extension branch from 3e81777 to ba7222f Compare May 13, 2025 00:59
Copy link

Array API standard conformance tests for dpctl=0.20.0dev0=py310h93fe807_206 ran successfully.
Passed: 1113
Failed: 6
Skipped: 119

@ndgrigorian ndgrigorian requested a review from antonwolfy May 14, 2025 09:28
Copy link

Array API standard conformance tests for dpctl=0.20.0dev0=py310h93fe807_211 ran successfully.
Passed: 1111
Failed: 8
Skipped: 119

Copy link

Array API standard conformance tests for dpctl=0.20.0dev0=py310h93fe807_213 ran successfully.
Passed: 1112
Failed: 7
Skipped: 119

with pytest.raises(ValueError):
dev.enable_peer_access(dev)
with pytest.raises(ValueError):
dev.enable_peer_access(dev)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
dev.enable_peer_access(dev)
dev.disable_peer_access(dev)

if (D && PD) {
if (_CallPeerAccess(*D, *PD)) {
try {
throw std::invalid_argument("test");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems debug leftover

Suggested change
throw std::invalid_argument("test");

auto D0 = DPCTLDeviceVector_GetAt(DV, 0);
ASSERT_TRUE(D1 != nullptr);
bool canEnable = false;
EXPECT_NO_FATAL_FAILURE(canEnable = DPCTLDevice_CanAccessPeer(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to add ASSERT_FALSE(canEnable) after that?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and probably initial value has to be true then:

bool canEnable = true;

ASSERT_TRUE(D1 != nullptr);
bool canEnable = false;
EXPECT_NO_FATAL_FAILURE(canEnable = DPCTLDevice_CanAccessPeer(
D0, D0, DPCTLPeerAccessType::access_supported));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to repeat the call and check with DPCTLPeerAccessType::atomics_supported?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general it seems access type can be also parametrized to reduce duplication, but not sure if it is easy to implement with gtest f/w.

TEST_P(TestDPCTLPeerAccess, ChkPeerAccessToSelf)
{
auto D0 = DPCTLDeviceVector_GetAt(DV, 0);
ASSERT_TRUE(D1 != nullptr);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ASSERT_TRUE(D1 != nullptr);
ASSERT_TRUE(D0 != nullptr);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants