[TRTLLM-5921][feat] Prevent serialization of entire LoRA adapters in each request #5080

amitz-nv · 2025-06-10T08:26:42Z

Description

Make LoRA pytorch to broadcast weights & config tensors only on new LoRA adapter, so when the adapter is already loaded on rank 0 (and thus, should be loaded on other ranks as well), only the adapter ID is sent in the request.

NOTE: The changes in this PR cause requests with a LoRA adapter that were previously unloaded from LoRA CPU cache not to work, as they remain in _cpp_lora_weights in LoraManager python class. It was decided that for now this flow is not supported, in favor of its significant performance boost.

Test Coverage

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--disable-fail-fast --skip-test --stage-list "A10-1, xxx" --gpu-type "A30, H100_PCIe" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-[Post-Merge]-1, xxx"]

Launch build/test pipelines. All previously running jobs will be killed.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests. Will also run L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-[Post-Merge]-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-[Post-Merge]-1, xxx".

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

amitz-nv · 2025-06-10T08:28:51Z

/bot run --stage-list "DGX_H100-4_GPUs-PyTorch-Others-1"

tensorrt-cicd · 2025-06-10T08:34:35Z

PR_Github #8251 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-10T14:59:42Z

PR_Github #8251 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5977 (Partly Tested) completed with status: 'FAILURE'

amitz-nv · 2025-06-19T11:09:24Z

/bot run

tensorrt-cicd · 2025-06-19T11:15:08Z

PR_Github #9501 [ run ] triggered by Bot

tensorrt_llm/lora_manager.py

tensorrt_llm/executor/worker.py

amitz-nv · 2025-06-19T13:58:30Z

/bot run

tensorrt-cicd · 2025-06-19T14:03:50Z

PR_Github #9518 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-19T14:03:52Z

PR_Github #9501 [ run ] completed with state ABORTED

shaharmor98 · 2025-06-19T14:27:03Z

/bot run

tensorrt-cicd · 2025-06-19T14:33:07Z

PR_Github #9522 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-19T14:33:09Z

PR_Github #9518 [ run ] completed with state ABORTED

amitz-nv · 2025-06-19T15:31:22Z

/bot run

tensorrt-cicd · 2025-06-19T15:37:17Z

PR_Github #9529 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-19T15:37:19Z

PR_Github #9522 [ run ] completed with state ABORTED

tensorrt-cicd · 2025-06-19T18:00:11Z

PR_Github #9529 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #6992 completed with status: 'FAILURE'

shaharmor98

One small last comment.
LGTM

tensorrt_llm/executor/worker.py

shaharmor98 · 2025-06-24T03:42:35Z

/bot run

tensorrt-cicd · 2025-06-24T03:48:02Z

PR_Github #9635 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-24T07:04:55Z

PR_Github #9635 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #7082 completed with status: 'FAILURE'

amitz-nv · 2025-06-24T07:29:08Z

/bot run

tensorrt-cicd · 2025-06-24T07:34:35Z

PR_Github #9654 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-24T09:34:46Z

PR_Github #9654 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #7097 completed with status: 'FAILURE'

amitz-nv · 2025-06-24T10:44:42Z

/bot run

tensorrt-cicd · 2025-06-24T10:49:50Z

PR_Github #9697 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-24T12:28:57Z

PR_Github #9697 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #7132 completed with status: 'FAILURE'

amitz-nv · 2025-06-24T12:38:55Z

/bot run

tensorrt-cicd · 2025-06-24T12:44:46Z

PR_Github #9710 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-24T15:44:14Z

PR_Github #9710 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #7145 completed with status: 'FAILURE'

shaharmor98 · 2025-06-25T06:08:47Z

/bot run

tensorrt-cicd · 2025-06-25T06:14:03Z

PR_Github #9818 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-25T09:42:16Z

PR_Github #9818 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #7244 completed with status: 'FAILURE'

…apter wasn't previously loaded Signed-off-by: Amit Zuker <[email protected]>

Signed-off-by: Amit Zuker <[email protected]>

…f, load_from_nemo Signed-off-by: Amit Zuker <[email protected]>

Signed-off-by: Amit Zuker <[email protected]>

…ew uids are found Signed-off-by: Amit Zuker <[email protected]>

amitz-nv · 2025-06-25T11:30:54Z

/bot run

tensorrt-cicd · 2025-06-25T11:38:14Z

PR_Github #9874 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-25T23:41:52Z

PR_Github #9874 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #7286 completed with status: 'SUCCESS'

…each request (NVIDIA#5080) Signed-off-by: Amit Zuker <[email protected]>

amitz-nv changed the title ~~Fix LoRA broadcast issue in pytorch flow~~ Fix LoRA pytorch to broadcast weights & config tensors only on new LoRA adapter Jun 19, 2025

amitz-nv force-pushed the dev-fix-lora-torch-broadcast branch from 2be6ca3 to c16bf53 Compare June 19, 2025 11:08

amitz-nv changed the title ~~Fix LoRA pytorch to broadcast weights & config tensors only on new LoRA adapter~~ [TRTLLM-5921][feat] Prevent serialization of entire LoRA adapters in each request Jun 19, 2025

amitz-nv requested a review from shaharmor98 June 19, 2025 11:15

shaharmor98 reviewed Jun 19, 2025

View reviewed changes

tensorrt_llm/lora_manager.py Show resolved Hide resolved

tensorrt_llm/lora_manager.py Show resolved Hide resolved

tensorrt_llm/executor/worker.py Outdated Show resolved Hide resolved

amitz-nv requested a review from shaharmor98 June 19, 2025 11:22

amitz-nv force-pushed the dev-fix-lora-torch-broadcast branch from b458fb1 to 9d5476e Compare June 19, 2025 13:58

amitz-nv marked this pull request as ready for review June 19, 2025 14:45

shaharmor98 approved these changes Jun 24, 2025

View reviewed changes

tensorrt_llm/executor/worker.py Outdated Show resolved Hide resolved

amitz-nv force-pushed the dev-fix-lora-torch-broadcast branch from 9d5476e to f651c79 Compare June 24, 2025 07:28

amitz-nv force-pushed the dev-fix-lora-torch-broadcast branch from f651c79 to a13fb4a Compare June 24, 2025 10:44

amitz-nv added 5 commits June 25, 2025 14:30

Sending LoRA adapter weights & config tensors on request only when ad…

007e181

…apter wasn't previously loaded Signed-off-by: Amit Zuker <[email protected]>

Improve _load_lora_adapter clarity

945886e

Signed-off-by: Amit Zuker <[email protected]>

Added docstring about the return value to load_from_ckpt, load_from_h…

0f528e3

…f, load_from_nemo Signed-off-by: Amit Zuker <[email protected]>

Simplify _load_lora_adapter return

97a3634

Signed-off-by: Amit Zuker <[email protected]>

Fix load_from_hf and load_from_nemo to return an empty list when no n…

eb2fd3c

…ew uids are found Signed-off-by: Amit Zuker <[email protected]>

amitz-nv force-pushed the dev-fix-lora-torch-broadcast branch from 03e0751 to eb2fd3c Compare June 25, 2025 11:30

shaharmor98 merged commit e0bb123 into NVIDIA:main Jun 26, 2025
3 checks passed

amitz-nv mentioned this pull request Jun 30, 2025

[TRTLLM-5826][feat] Support pytorch LoRA adapter eviction #5616

Merged

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 9, 2025

[TRTLLM-5921][feat] Prevent serialization of entire LoRA adapters in …

bd9c588

…each request (NVIDIA#5080) Signed-off-by: Amit Zuker <[email protected]>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 10, 2025

[TRTLLM-5921][feat] Prevent serialization of entire LoRA adapters in …

5cfa7d1

…each request (NVIDIA#5080) Signed-off-by: Amit Zuker <[email protected]>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 10, 2025

[TRTLLM-5921][feat] Prevent serialization of entire LoRA adapters in …

90e3a05

…each request (NVIDIA#5080) Signed-off-by: Amit Zuker <[email protected]>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 10, 2025

[TRTLLM-5921][feat] Prevent serialization of entire LoRA adapters in …

9d5ff8d

…each request (NVIDIA#5080) Signed-off-by: Amit Zuker <[email protected]>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 10, 2025

[TRTLLM-5921][feat] Prevent serialization of entire LoRA adapters in …

472f6b6

…each request (NVIDIA#5080) Signed-off-by: Amit Zuker <[email protected]>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 11, 2025

[TRTLLM-5921][feat] Prevent serialization of entire LoRA adapters in …

9ab2f98

…each request (NVIDIA#5080) Signed-off-by: Amit Zuker <[email protected]>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 11, 2025

[TRTLLM-5921][feat] Prevent serialization of entire LoRA adapters in …

2fec59f

…each request (NVIDIA#5080) Signed-off-by: Amit Zuker <[email protected]>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 11, 2025

[TRTLLM-5921][feat] Prevent serialization of entire LoRA adapters in …

cebe18b

…each request (NVIDIA#5080) Signed-off-by: Amit Zuker <[email protected]>

[TRTLLM-5921][feat] Prevent serialization of entire LoRA adapters in each request #5080

[TRTLLM-5921][feat] Prevent serialization of entire LoRA adapters in each request #5080

Uh oh!

Conversation

amitz-nv commented Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Test Coverage

GitHub Bot Help

kill

skip

reuse-pipeline

Uh oh!

amitz-nv commented Jun 10, 2025

Uh oh!

tensorrt-cicd commented Jun 10, 2025

Uh oh!

tensorrt-cicd commented Jun 10, 2025

Uh oh!

amitz-nv commented Jun 19, 2025

Uh oh!

tensorrt-cicd commented Jun 19, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

amitz-nv commented Jun 19, 2025

Uh oh!

tensorrt-cicd commented Jun 19, 2025

Uh oh!

tensorrt-cicd commented Jun 19, 2025

Uh oh!

shaharmor98 commented Jun 19, 2025

Uh oh!

tensorrt-cicd commented Jun 19, 2025

Uh oh!

tensorrt-cicd commented Jun 19, 2025

Uh oh!

amitz-nv commented Jun 19, 2025

Uh oh!

tensorrt-cicd commented Jun 19, 2025

Uh oh!

tensorrt-cicd commented Jun 19, 2025

Uh oh!

tensorrt-cicd commented Jun 19, 2025

Uh oh!

shaharmor98 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

shaharmor98 commented Jun 24, 2025

Uh oh!

tensorrt-cicd commented Jun 24, 2025

Uh oh!

tensorrt-cicd commented Jun 24, 2025

Uh oh!

amitz-nv commented Jun 24, 2025

Uh oh!

tensorrt-cicd commented Jun 24, 2025

Uh oh!

tensorrt-cicd commented Jun 24, 2025

Uh oh!

amitz-nv commented Jun 24, 2025

Uh oh!

tensorrt-cicd commented Jun 24, 2025

Uh oh!

tensorrt-cicd commented Jun 24, 2025

Uh oh!

amitz-nv commented Jun 24, 2025

Uh oh!

tensorrt-cicd commented Jun 24, 2025

Uh oh!

tensorrt-cicd commented Jun 24, 2025

Uh oh!

shaharmor98 commented Jun 25, 2025

Uh oh!

tensorrt-cicd commented Jun 25, 2025

Uh oh!

tensorrt-cicd commented Jun 25, 2025

Uh oh!

amitz-nv commented Jun 10, 2025 •

edited

Loading