Skip to content

Conversation

wincent8
Copy link
Contributor

@wincent8 wincent8 commented Aug 22, 2025

In this pr, we port test/distributed/parallel 4 test files and test/distributed/debug 1 test file for Intel GPU
We could enable Intel GPU with following methods and try the best to keep the original code styles:

  1. Use torch.accelerator for general gpu
  2. Skip the case if running on xpu which has known issues

cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @bdhirsh @tianyu-l @XilunWu

Copy link

pytorch-bot bot commented Aug 22, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/161261

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

✅ You can merge normally! (1 Unrelated Failure)

As of commit c587eeb with merge base 6737e2c (image):

FLAKY - The following job failed but was likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added oncall: distributed Add this issue/PR to distributed oncall triage queue topic: not user facing topic category labels Aug 22, 2025
@wincent8 wincent8 force-pushed the wliao2/add_tensor_1 branch from eb22371 to d1bc8b7 Compare August 22, 2025 10:54
@daisyden daisyden changed the title port distributed tensor parallel test files for Intel GPU [WIP]port distributed tensor parallel test files for Intel GPU Aug 28, 2025
@wincent8 wincent8 force-pushed the wliao2/add_tensor_1 branch from bb1fdc0 to 0a114ca Compare September 1, 2025 09:23
@wincent8 wincent8 changed the title [WIP]port distributed tensor parallel test files for Intel GPU port distributed tensor parallel test files for Intel GPU Sep 1, 2025
Copy link

pytorch-bot bot commented Sep 2, 2025

To add the ciflow label ciflow/h100-distributed please first approve the workflows that are awaiting approval (scroll to the bottom of this page).

This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows.

@daisyden daisyden added module: dtensor distributed tensor tag release notes: distributed (dtensor) release notes category keep-going Don't stop on first failure, keep running tests until the end labels Sep 2, 2025
@wincent8 wincent8 force-pushed the wliao2/add_tensor_1 branch from 0a114ca to c587eeb Compare September 2, 2025 06:10
Copy link
Collaborator

@guangyey guangyey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@guangyey guangyey requested a review from d4l3k September 2, 2025 06:58
@guangyey guangyey added the ciflow/xpu Run XPU CI tasks label Sep 2, 2025
Copy link

pytorch-bot bot commented Sep 2, 2025

To add the ciflow label ciflow/xpu please first approve the workflows that are awaiting approval (scroll to the bottom of this page).

This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows.

@pytorch-bot pytorch-bot bot removed the ciflow/xpu Run XPU CI tasks label Sep 2, 2025
@guangyey guangyey added the ciflow/xpu Run XPU CI tasks label Sep 2, 2025
Copy link

pytorch-bot bot commented Sep 2, 2025

To add the ciflow label ciflow/xpu please first approve the workflows that are awaiting approval (scroll to the bottom of this page).

This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows.

@pytorch-bot pytorch-bot bot removed the ciflow/xpu Run XPU CI tasks label Sep 2, 2025
@guangyey guangyey added module: xla Related to XLA support ciflow/xpu Run XPU CI tasks labels Sep 2, 2025
Copy link
Member

@d4l3k d4l3k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wincent8
Copy link
Contributor Author

wincent8 commented Sep 3, 2025

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 3, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

@guangyey
Copy link
Collaborator

guangyey commented Sep 3, 2025

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request ciflow/xpu Run XPU CI tasks keep-going Don't stop on first failure, keep running tests until the end Merged module: dtensor distributed tensor tag module: xla Related to XLA support oncall: distributed Add this issue/PR to distributed oncall triage queue open source release notes: distributed (dtensor) release notes category topic: not user facing topic category
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

6 participants