Ubnext #2038

nv-akorzh · 2025-08-06T21:35:39Z

Description

Added UBnext fast Allreduce kernels into linear and layernorm_linear layer.
Falls under symmetric_ar_type with new type being 'ub_custom'

Details

Added UC MC and fast sync - low latency (lamport) allreduce kernel.

Added symmetric allocator which uses pytorch symmetric to allocate pool and suballocate from it.

As pytorch symmetric doesnt support MNNVL yet there is a fallback to use legacy UB code by creating a 11th CommOverlap object. Enabled with env NVTE_USE_UB_FOR_UBNEXT ( requires user to initialize ub by calling initialize_ub)
NVTE_UB_MAXBATCH (default 128) can increase batch size which would have enough memory for fastest kernel. If memory cant be allocated there is gradual fallback: first to UBmain in-place kernel if input could be allocated and output couldnt, and to pytorch symmetric if input couldnt be allocated.

NVTE_UB_SYMM_POOL_SIZE env overrides pool size to given number of megabytes.

Signed-off-by: Anton Korzh <[email protected]>

…ding zeros (NVIDIA#2019) * for loop Signed-off-by: Xin Yao <[email protected]> * bulk alloc Signed-off-by: Xin Yao <[email protected]> * multi-tensor swizzle Signed-off-by: Xin Yao <[email protected]> * pad zeros in swizzle kernels Signed-off-by: Xin Yao <[email protected]> * unify single- and multi-tensor swizzle Signed-off-by: Xin Yao <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix empty tensor list Signed-off-by: Xin Yao <[email protected]> * fix bug for col swizzle Signed-off-by: Xin Yao <[email protected]> * check context & fix signifiers Signed-off-by: Xin Yao <[email protected]> --------- Signed-off-by: Xin Yao <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Anton Korzh <[email protected]>

…kwise FP8 convert_and_update_tensor (NVIDIA#1978) * fix input_quantizer in save_original_input bwd Signed-off-by: Hongxiao Bai <[email protected]> * fix get shape of blockwise tensor with only compact colwise data Signed-off-by: Hongxiao Bai <[email protected]> * fix blockwise FP8 convert_and_update_tensor Signed-off-by: Hongxiao Bai <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Hongxiao Bai <[email protected]> Co-authored-by: Tim Moon <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Kirthi Shankar Sivamani <[email protected]> Signed-off-by: Anton Korzh <[email protected]>

Revert "[JAX] Disable TE Norm Custom Calls (NVIDIA#1993)" This reverts commit 6c97061. --------- Signed-off-by: Phuong Nguyen <[email protected]> Signed-off-by: Anton Korzh <[email protected]>

for more information, see https://pre-commit.ci Signed-off-by: Anton Korzh <[email protected]>

Signed-off-by: Anton Korzh <[email protected]>

nv-akorzh force-pushed the ubnext branch from 725e1fe to 8fded47 Compare August 6, 2025 21:39

nv-akorzh and others added 9 commits August 7, 2025 14:31

UBNext Allreduce integration

2b1fad2

Signed-off-by: Anton Korzh <[email protected]>

layernorm_linear using ubnext

1155080

Signed-off-by: Anton Korzh <[email protected]>

minor fix

8e5ba45

Signed-off-by: Anton Korzh <[email protected]>

Revert "[JAX] Disable TE Norm Custom Calls" (NVIDIA#2035)

54f092d

Revert "[JAX] Disable TE Norm Custom Calls (NVIDIA#1993)" This reverts commit 6c97061. --------- Signed-off-by: Phuong Nguyen <[email protected]> Signed-off-by: Anton Korzh <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

45d207b

for more information, see https://pre-commit.ci Signed-off-by: Anton Korzh <[email protected]>

fix output shape

214b89a

Signed-off-by: Anton Korzh <[email protected]>

tp1 fix

2b8b8c2

Signed-off-by: Anton Korzh <[email protected]>

nv-akorzh force-pushed the ubnext branch from bac6497 to 2b8b8c2 Compare August 7, 2025 21:31

denera self-requested a review August 8, 2025 14:54

Merge branch 'NVIDIA:main' into ubnext

6d3ed8d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Ubnext #2038

Ubnext #2038

Uh oh!

nv-akorzh commented Aug 6, 2025

Uh oh!

Uh oh!

Ubnext #2038

Are you sure you want to change the base?

Ubnext #2038

Uh oh!

Conversation

nv-akorzh commented Aug 6, 2025

Description

Details

Uh oh!

Uh oh!