Skip to content

Feature distributed fused adam #184

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Mar 21, 2025
Merged

Conversation

amd-sriram
Copy link
Collaborator

Ported Distributed_fused_adam from Nvidia/apex.

Ref: https://ontrack-internal.amd.com/browse/SWDEV-519796

…dependencies - fused adam, distributed adam. Updated the unit test case for distributed fused adam.
…uted fused adam. Skipped these particular UTs
@amd-sriram amd-sriram self-assigned this Mar 19, 2025
amd-sriram added a commit to ROCm/pytorch that referenced this pull request Mar 19, 2025
…locator. This is impacting Distributed Fused Adam in Rocm/APEX. See PR ROCm/apex#184
amd-sriram added a commit to ROCm/pytorch that referenced this pull request Mar 20, 2025
…cator. This is impacting Distributed Fused Adam in Rocm/APEX.

See PR ROCm/apex#184
@pruthvistony
Copy link

@amd-sriram ,
Assuming it is tested and no breakages.

@pruthvistony
Copy link

Please cherry-pick this into release/1.6.0 branch.

@pruthvistony pruthvistony merged commit 6fd8b50 into master Mar 21, 2025
@pruthvistony pruthvistony deleted the feature_distributed_fused_adam branch March 21, 2025 21:29
@amd-sriram
Copy link
Collaborator Author

@amd-sriram , Assuming it is tested and no breakages.

@pruthvistony : Yes
Yes It was tested in Rocm. also checked with Nvidia/apex - upstream to match the outcomes of the test in Rocm/apex.

There is one PR in Pytorch that will impact one UT and the nccl_ub feature in distributed_fused_adam.

ROCm/pytorch#1984

Kindly review this as well.

amd-sriram added a commit that referenced this pull request Mar 24, 2025
* Updated feature of distributed fused adam from upstream. Updated its dependencies - fused adam, distributed adam. Updated the unit test case for distributed fused adam.

* Raise Exception when nccl user buffer / cuda graph is used in distributed fused adam. Skipped these particular UTs

* Adding support for rccl_ub in distributed_fused_adam

* build nccl_allocator module when cuda_ext flag is mentioned
pruthvistony pushed a commit that referenced this pull request Mar 24, 2025
* Feature distributed fused adam (#184)

* Updated feature of distributed fused adam from upstream. Updated its dependencies - fused adam, distributed adam. Updated the unit test case for distributed fused adam.

* Raise Exception when nccl user buffer / cuda graph is used in distributed fused adam. Skipped these particular UTs

* Adding support for rccl_ub in distributed_fused_adam

* build nccl_allocator module when cuda_ext flag is mentioned

* Ported distributed fused lamb from upstream repo. Add support for parameters - fused_norm, full_ar, set_param_views_to_flat_buffer, skip_allgather, fuse_scale, param_order, Nccl_allgather_channels (#185)

* for distributed fused adam, add condition to remove nccl_allocator only if it is mentioned explicitly
pruthvistony pushed a commit to ROCm/pytorch that referenced this pull request Mar 24, 2025
Altering the flag to use the correct streamType for
CUDAPluggableAllocator. This is impacting Distributed Fused Adam in
Rocm/APEX.

See PR ROCm/apex#184

Related Issue : https://ontrack-internal.amd.com/browse/SWDEV-519796
amd-sriram added a commit that referenced this pull request Mar 25, 2025
* Updated feature of distributed fused adam from upstream. Updated its dependencies - fused adam, distributed adam. Updated the unit test case for distributed fused adam.

* Raise Exception when nccl user buffer / cuda graph is used in distributed fused adam. Skipped these particular UTs

* Adding support for rccl_ub in distributed_fused_adam

* build nccl_allocator module when cuda_ext flag is mentioned
amd-sriram added a commit to amd-sriram/pytorch that referenced this pull request Mar 26, 2025
)

Altering the flag to use the correct streamType for
CUDAPluggableAllocator. This is impacting Distributed Fused Adam in
Rocm/APEX.

See PR ROCm/apex#184

Related Issue : https://ontrack-internal.amd.com/browse/SWDEV-519796
pytorchmergebot pushed a commit to pytorch/pytorch that referenced this pull request Apr 1, 2025
Altering the flag to use the correct streamType in CUDAPluggableAllocator class for ROCm gpu. The flag TORCH_HIP_VERSION does not work for ROCm as intended. This flag is replaced with USE_ROCM. This is impacting Distributed Fused Adam in Rocm/APEX when using nccl_ub feature. This has been tested with rocm/apex.

See PR ROCm/apex#184

Pull Request resolved: #150010
Approved by: https://github.com/jeffdaily
amathewc pushed a commit to amathewc/pytorch that referenced this pull request Apr 17, 2025
Altering the flag to use the correct streamType in CUDAPluggableAllocator class for ROCm gpu. The flag TORCH_HIP_VERSION does not work for ROCm as intended. This flag is replaced with USE_ROCM. This is impacting Distributed Fused Adam in Rocm/APEX when using nccl_ub feature. This has been tested with rocm/apex.

See PR ROCm/apex#184

Pull Request resolved: pytorch#150010
Approved by: https://github.com/jeffdaily
amd-sriram added a commit to amd-sriram/pytorch that referenced this pull request Apr 25, 2025
)

Altering the flag to use the correct streamType for
CUDAPluggableAllocator. This is impacting Distributed Fused Adam in
Rocm/APEX.

See PR ROCm/apex#184

Related Issue : https://ontrack-internal.amd.com/browse/SWDEV-519796
pytorchbot pushed a commit to pytorch/pytorch that referenced this pull request May 20, 2025
Altering the flag to use the correct streamType in CUDAPluggableAllocator class for ROCm gpu. The flag TORCH_HIP_VERSION does not work for ROCm as intended. This flag is replaced with USE_ROCM. This is impacting Distributed Fused Adam in Rocm/APEX when using nccl_ub feature. This has been tested with rocm/apex.

See PR ROCm/apex#184

Pull Request resolved: #150010
Approved by: https://github.com/jeffdaily

(cherry picked from commit a19b667)
atalman pushed a commit to pytorch/pytorch that referenced this pull request May 22, 2025
[ROCm] Update CUDAPluggableAllocator.h (#1984) (#150010)

Altering the flag to use the correct streamType in CUDAPluggableAllocator class for ROCm gpu. The flag TORCH_HIP_VERSION does not work for ROCm as intended. This flag is replaced with USE_ROCM. This is impacting Distributed Fused Adam in Rocm/APEX when using nccl_ub feature. This has been tested with rocm/apex.

See PR ROCm/apex#184

Pull Request resolved: #150010
Approved by: https://github.com/jeffdaily

(cherry picked from commit a19b667)

Co-authored-by: Sriram Kumar <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants