port distributed tensor parallel test files for Intel GPU #161261

wincent8 · 2025-08-22T10:46:00Z

In this pr, we port test/distributed/parallel 4 test files and test/distributed/debug 1 test file for Intel GPU
We could enable Intel GPU with following methods and try the best to keep the original code styles:

Use torch.accelerator for general gpu
Skip the case if running on xpu which has known issues

cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @bdhirsh @tianyu-l @XilunWu

pytorch-bot · 2025-08-22T10:46:04Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/161261

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

ROCm MI2xx CI/CD workflows failing due to : download from https://github.com/api/repos/pytorch/pytorch timed out.

✅ You can merge normally! (1 Unrelated Failure)

As of commit c587eeb with merge base 6737e2c ():

FLAKY - The following job failed but was likely due to flakiness present on trunk:

xpu / linux-jammy-xpu-n-py3.10 / test (default, 5, 8, linux.idc.xpu) (gh) (similar failure)
'Test'

This comment was automatically generated by Dr. CI and updates every 15 minutes.

pytorch-bot · 2025-09-02T02:25:26Z

To add the ciflow label ciflow/h100-distributed please first approve the workflows that are awaiting approval (scroll to the bottom of this page).

This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows.

test/distributed/tensor/parallel/test_tp_examples.py

guangyey

LGTM.

pytorch-bot · 2025-09-02T06:58:53Z

To add the ciflow label ciflow/xpu please first approve the workflows that are awaiting approval (scroll to the bottom of this page).

This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows.

pytorch-bot · 2025-09-02T06:59:11Z

To add the ciflow label ciflow/xpu please first approve the workflows that are awaiting approval (scroll to the bottom of this page).

This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows.

d4l3k

LGTM

wincent8 · 2025-09-03T07:04:43Z

@pytorchbot merge

pytorchmergebot · 2025-09-03T07:09:46Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2025-09-03T13:06:02Z

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

guangyey · 2025-09-03T14:56:18Z

@pytorchbot merge

pytorchmergebot · 2025-09-03T14:59:56Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorch-bot bot added oncall: distributed Add this issue/PR to distributed oncall triage queue topic: not user facing topic category labels Aug 22, 2025

pytorchbot added the open source label Aug 22, 2025

wincent8 force-pushed the wliao2/add_tensor_1 branch from eb22371 to d1bc8b7 Compare August 22, 2025 10:54

daisyden changed the title ~~port distributed tensor parallel test files for Intel GPU~~ [WIP]port distributed tensor parallel test files for Intel GPU Aug 28, 2025

wincent8 force-pushed the wliao2/add_tensor_1 branch from bb1fdc0 to 0a114ca Compare September 1, 2025 09:23

wincent8 changed the title ~~[WIP]port distributed tensor parallel test files for Intel GPU~~ port distributed tensor parallel test files for Intel GPU Sep 1, 2025

daisyden added the ciflow/h100-distributed label Sep 2, 2025

pytorch-bot bot removed the ciflow/h100-distributed label Sep 2, 2025

daisyden added module: dtensor distributed tensor tag release notes: distributed (dtensor) release notes category keep-going Don't stop on first failure, keep running tests until the end labels Sep 2, 2025

guangyey reviewed Sep 2, 2025

View reviewed changes

test/distributed/tensor/parallel/test_tp_examples.py Outdated Show resolved Hide resolved

guangyey reviewed Sep 2, 2025

View reviewed changes

test/distributed/tensor/parallel/test_tp_examples.py Outdated Show resolved Hide resolved

guangyey added this to PyTorch Intel Sep 2, 2025

guangyey reviewed Sep 2, 2025

View reviewed changes

test/distributed/tensor/parallel/test_tp_examples.py Outdated Show resolved Hide resolved

wincent8 force-pushed the wliao2/add_tensor_1 branch from 0a114ca to c587eeb Compare September 2, 2025 06:10

wincent8 added 11 commits September 2, 2025 14:20

update for xpu

0d297fc

port for xpu

4ec3b00

add xfail

7b67ba5

update skipper

74e6d44

port for xpu

cb453ae

enable distributed tensor parallel cases for xpu

849504f

add xfail

41d9331

enable distributed tensor parallel cases for xpu

a628336

revert change of case not under parallel

f0d79cd

revert change due to known issue

094af96

enable for xpu

5d19af2

wincent8 added 3 commits September 2, 2025 14:20

update skipper according to issue fix

24b3dd0

refine code

04bc5d4

skip the whole case

c587eeb

guangyey approved these changes Sep 2, 2025

View reviewed changes

guangyey requested a review from d4l3k September 2, 2025 06:58

guangyey added the ciflow/xpu Run XPU CI tasks label Sep 2, 2025

pytorch-bot bot removed the ciflow/xpu Run XPU CI tasks label Sep 2, 2025

guangyey added the ciflow/xpu Run XPU CI tasks label Sep 2, 2025

pytorch-bot bot removed the ciflow/xpu Run XPU CI tasks label Sep 2, 2025

guangyey added module: xla Related to XLA support ciflow/xpu Run XPU CI tasks labels Sep 2, 2025

d4l3k approved these changes Sep 2, 2025

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 3, 2025

pytorchmergebot added the merging label Sep 3, 2025

pytorchmergebot closed this in c157cf6 Sep 3, 2025

pytorchmergebot added the Merged label Sep 3, 2025

github-project-automation bot moved this to Done in PyTorch Intel Sep 3, 2025

pytorchmergebot removed the merging label Sep 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

port distributed tensor parallel test files for Intel GPU #161261

port distributed tensor parallel test files for Intel GPU #161261

Uh oh!

wincent8 commented Aug 22, 2025 •

edited by pytorch-bot bot

Loading

Uh oh!

pytorch-bot bot commented Aug 22, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Sep 2, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

guangyey left a comment

Uh oh!

pytorch-bot bot commented Sep 2, 2025

Uh oh!

pytorch-bot bot commented Sep 2, 2025

Uh oh!

d4l3k left a comment

Uh oh!

wincent8 commented Sep 3, 2025

Uh oh!

pytorchmergebot commented Sep 3, 2025

Uh oh!

pytorchmergebot commented Sep 3, 2025

Uh oh!

guangyey commented Sep 3, 2025

Uh oh!

pytorchmergebot commented Sep 3, 2025

Uh oh!

Uh oh!

port distributed tensor parallel test files for Intel GPU #161261

port distributed tensor parallel test files for Intel GPU #161261

Uh oh!

Conversation

wincent8 commented Aug 22, 2025 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/161261

❗ 1 Active SEVs

✅ You can merge normally! (1 Unrelated Failure)

Uh oh!

pytorch-bot bot commented Sep 2, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

guangyey left a comment

Choose a reason for hiding this comment

Uh oh!

pytorch-bot bot commented Sep 2, 2025

Uh oh!

pytorch-bot bot commented Sep 2, 2025

Uh oh!

d4l3k left a comment

Choose a reason for hiding this comment

Uh oh!

wincent8 commented Sep 3, 2025

Uh oh!

pytorchmergebot commented Sep 3, 2025

Merge started

Uh oh!

pytorchmergebot commented Sep 3, 2025

Uh oh!

guangyey commented Sep 3, 2025

Uh oh!

pytorchmergebot commented Sep 3, 2025

Merge started

Uh oh!

Uh oh!

wincent8 commented Aug 22, 2025 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Aug 22, 2025 •

edited

Loading