-
Notifications
You must be signed in to change notification settings - Fork 24.4k
Add test c10d ucc tests #88110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add test c10d ucc tests #88110
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/88110
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ✅ No FailuresAs of commit e86c068: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All existing #38095 issues were fixed some time ago and the issue was closed, please don't re-introduce.
Ready for review |
@kit1980 Do you still have something to request change or are you OK with the current status? |
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 5 jobs have failed, first few of them are: trunk / win-vs2019-cuda11.7-py3 / test (default, 2, 6, windows.g5.4xlarge.nvidia.gpu), trunk / win-vs2019-cuda11.7-py3 / test (default, 3, 6, windows.g5.4xlarge.nvidia.gpu), trunk / win-vs2019-cuda11.7-py3 / test (default, 4, 6, windows.g5.4xlarge.nvidia.gpu), trunk / win-vs2019-cuda11.7-py3 / test (default, 5, 6, windows.g5.4xlarge.nvidia.gpu), trunk / win-vs2019-cuda11.7-py3 / test (default, 6, 6, windows.g5.4xlarge.nvidia.gpu) Details for Dev Infra teamRaised by workflow job |
…est_c10d_ucc_tests
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
…not available (#98576) After the recent change on #88110 to add a new c10d test for UCC backend, the test starts to fail on ROCm distributed job. I guess ROCm doesn't support that backend yet, so I go ahead and disable the test there. Please let me know if the support on ROCm is coming, I will close this PR accordingly. But it's now failing in ROCm trunk with `AssertionError: Unknown c10d backend type UCC`, for example https://hud.pytorch.org/pytorch/pytorch/commit/4adba70cc6fa273f210a94a82b337bbddffc3c1d Pull Request resolved: #98576 Approved by: https://github.com/Fuzzkatt, https://github.com/jithunnair-amd, https://github.com/malfet, https://github.com/ZainRizvi
…8110 (#99654) * Adds extra test_allgather_base in UccProcessGroupWithDispatchedCollectivesTests; rest of nccl and gloo tests there don't work on ucc * Adds cpu tests for [op]_work_wait_gpu tests * Added single tensor input test for allgather_basics; multi tensor input still doesn't seem to be supported by ucc Pull Request resolved: #99654 Approved by: https://github.com/kwen2501
Creates the equivalent c10d test for ucc for https://github.com/pytorch/pytorch/blob/master/test/distributed/test_c10d_gloo.py and https://github.com/pytorch/pytorch/blob/master/test/distributed/test_c10d_nccl.py. Uses test_c10d_gloo.py as the reference and adds all the common ops. More detailed comparison of available ops here: https://docs.google.com/document/d/1yPsa_X9EiEiqo-j2Yn7ierhccBtEjwoqC-B7-amI0MI/edit?usp=sharing
Also removes extra line for ProcessGroupUCC.cpp barrier blocking wait that got duplicated from merging #85047.
cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @H-Huang @kwen2501 @awgu