feat: large-scale EP(part 7: DeepEP integration) #4792

yuantailing · 2025-05-30T05:41:36Z

DeepEP integration

Description

Support matrix:

EP Communication method	Intra-node	Inter-node	CUDA Graphs	Notes
MNNVL	Y	Y	Y	Only MNNVL systems
DeepEP	Y	Y	N	IBGDA required for inter-node
DeepEPLowLatency	Y	Y	Y	IBGDA required

Please refer to select_alltoall_method_type (in fused_moe_cutlass.py) for the condition of enabling DeepEP or DeepEPLowLatency. This is an experimental feature, so an environment variable TRTLLM_CAN_USE_DEEP_EP=1 is required.

One of the following lines will be printed at initialization:

[TRT-LLM] [RANK 0] [I] CutlassFusedMoE selects alltoall_method_type <AlltoallMethodType.NotEnabled: 0>
[TRT-LLM] [RANK 0] [I] CutlassFusedMoE selects alltoall_method_type <AlltoallMethodType.MNNVL: 1>
[TRT-LLM] [RANK 0] [I] CutlassFusedMoE selects alltoall_method_type <AlltoallMethodType.DeepEP: 2>
[TRT-LLM] [RANK 0] [I] CutlassFusedMoE selects alltoall_method_type <AlltoallMethodType.DeepEPLowLatency: 3>

Known issues:

DeepEPLowLatency + FP4 postquant_alltoall is not implemented. Set TRTLLM_MOE_POST_QUANT_ALLTOALLV=0 instead.

Test Coverage

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--disable-fail-fast --skip-test --stage-list "A10-1, xxx" --gpu-type "A30, H100_PCIe" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-[Post-Merge]-1, xxx"]

Launch build/test pipelines. All previously running jobs will be killed.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests. Will also run L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-[Post-Merge]-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-[Post-Merge]-1, xxx".

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

kaiyux · 2025-05-30T06:31:33Z

Notes from offline discussion

It's ok to not support ootb pip install as the first step, moving further we need to figure out a way
Use env var as the first step to enable deepep
Add a multi-gpu e2e test to cover basic functionality

kaiyux · 2025-05-30T06:31:45Z

/bot -h

github-actions · 2025-05-30T06:31:54Z