-
Notifications
You must be signed in to change notification settings - Fork 26
Feature distributed fused adam #184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…dependencies - fused adam, distributed adam. Updated the unit test case for distributed fused adam.
…uted fused adam. Skipped these particular UTs
…locator. This is impacting Distributed Fused Adam in Rocm/APEX. See PR ROCm/apex#184
…cator. This is impacting Distributed Fused Adam in Rocm/APEX. See PR ROCm/apex#184
@amd-sriram , |
Please cherry-pick this into release/1.6.0 branch. |
@pruthvistony : Yes There is one PR in Pytorch that will impact one UT and the nccl_ub feature in distributed_fused_adam. Kindly review this as well. |
* Updated feature of distributed fused adam from upstream. Updated its dependencies - fused adam, distributed adam. Updated the unit test case for distributed fused adam. * Raise Exception when nccl user buffer / cuda graph is used in distributed fused adam. Skipped these particular UTs * Adding support for rccl_ub in distributed_fused_adam * build nccl_allocator module when cuda_ext flag is mentioned
* Feature distributed fused adam (#184) * Updated feature of distributed fused adam from upstream. Updated its dependencies - fused adam, distributed adam. Updated the unit test case for distributed fused adam. * Raise Exception when nccl user buffer / cuda graph is used in distributed fused adam. Skipped these particular UTs * Adding support for rccl_ub in distributed_fused_adam * build nccl_allocator module when cuda_ext flag is mentioned * Ported distributed fused lamb from upstream repo. Add support for parameters - fused_norm, full_ar, set_param_views_to_flat_buffer, skip_allgather, fuse_scale, param_order, Nccl_allgather_channels (#185) * for distributed fused adam, add condition to remove nccl_allocator only if it is mentioned explicitly
Altering the flag to use the correct streamType for CUDAPluggableAllocator. This is impacting Distributed Fused Adam in Rocm/APEX. See PR ROCm/apex#184 Related Issue : https://ontrack-internal.amd.com/browse/SWDEV-519796
* Updated feature of distributed fused adam from upstream. Updated its dependencies - fused adam, distributed adam. Updated the unit test case for distributed fused adam. * Raise Exception when nccl user buffer / cuda graph is used in distributed fused adam. Skipped these particular UTs * Adding support for rccl_ub in distributed_fused_adam * build nccl_allocator module when cuda_ext flag is mentioned
) Altering the flag to use the correct streamType for CUDAPluggableAllocator. This is impacting Distributed Fused Adam in Rocm/APEX. See PR ROCm/apex#184 Related Issue : https://ontrack-internal.amd.com/browse/SWDEV-519796
Altering the flag to use the correct streamType in CUDAPluggableAllocator class for ROCm gpu. The flag TORCH_HIP_VERSION does not work for ROCm as intended. This flag is replaced with USE_ROCM. This is impacting Distributed Fused Adam in Rocm/APEX when using nccl_ub feature. This has been tested with rocm/apex. See PR ROCm/apex#184 Pull Request resolved: #150010 Approved by: https://github.com/jeffdaily
Altering the flag to use the correct streamType in CUDAPluggableAllocator class for ROCm gpu. The flag TORCH_HIP_VERSION does not work for ROCm as intended. This flag is replaced with USE_ROCM. This is impacting Distributed Fused Adam in Rocm/APEX when using nccl_ub feature. This has been tested with rocm/apex. See PR ROCm/apex#184 Pull Request resolved: pytorch#150010 Approved by: https://github.com/jeffdaily
) Altering the flag to use the correct streamType for CUDAPluggableAllocator. This is impacting Distributed Fused Adam in Rocm/APEX. See PR ROCm/apex#184 Related Issue : https://ontrack-internal.amd.com/browse/SWDEV-519796
Altering the flag to use the correct streamType in CUDAPluggableAllocator class for ROCm gpu. The flag TORCH_HIP_VERSION does not work for ROCm as intended. This flag is replaced with USE_ROCM. This is impacting Distributed Fused Adam in Rocm/APEX when using nccl_ub feature. This has been tested with rocm/apex. See PR ROCm/apex#184 Pull Request resolved: #150010 Approved by: https://github.com/jeffdaily (cherry picked from commit a19b667)
[ROCm] Update CUDAPluggableAllocator.h (#1984) (#150010) Altering the flag to use the correct streamType in CUDAPluggableAllocator class for ROCm gpu. The flag TORCH_HIP_VERSION does not work for ROCm as intended. This flag is replaced with USE_ROCM. This is impacting Distributed Fused Adam in Rocm/APEX when using nccl_ub feature. This has been tested with rocm/apex. See PR ROCm/apex#184 Pull Request resolved: #150010 Approved by: https://github.com/jeffdaily (cherry picked from commit a19b667) Co-authored-by: Sriram Kumar <[email protected]>
Ported Distributed_fused_adam from Nvidia/apex.
Ref: https://ontrack-internal.amd.com/browse/SWDEV-519796