Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
257 commits
Select commit Hold shift + click to select a range
1753202
[TRTLLM-5825][fix] Fix torch LoRA TP (#5338)
amitz-nv Jun 19, 2025
21ce9b6
test: add qwen3 cases (#5302)
ruodil Jun 19, 2025
e22e884
test: amend test case name in perf cluster test (#5356)
ruodil Jun 19, 2025
b558232
Refactor CutlassFusedMoE (#5344)
hlu1 Jun 19, 2025
493f268
[Infra]Fix l0_sanity_check.yml which also has gb202 and gb203 (#5360)
EmmaQiaoCh Jun 19, 2025
bca758f
fix: Fix DS-R1 nvfp4 test case naming (#5361)
syuoni Jun 19, 2025
68687a9
[WAR][nvbug/5321947] Add an async sleep to unblock event loop. (#5342)
FrankD412 Jun 19, 2025
9a53e58
blog: Disaggregated Serving in TensorRT-LLM (#5353)
Shixiaowei02 Jun 19, 2025
c7af650
Fix: fix the deterministic issue in the MTP Eagle path (#5285)
lfr-0531 Jun 19, 2025
1e35be5
doc: subsequent modifications of blog 5 (#5366)
Shixiaowei02 Jun 19, 2025
7246fd7
feat: Support stream_interval (#5284)
kaiyux Jun 19, 2025
113f6fb
Fix: missing clientId when serialize and deserialize response (#5231)
kaiyux Jun 19, 2025
9bd42ec
[TRTLLM-5208][BREAKING CHANGE] chore: make pytorch LLM the default (#…
Superjomn Jun 19, 2025
b1878ea
Add Wechat_Group_QR_Code.png to docs/source/media and main page of TR…
AdamzNV Jun 19, 2025
5d4ab47
fix: refactor and fix mtp vanilla (#4762)
lfr-0531 Jun 19, 2025
4f0f17a
feat: Misc Opt for large scale EP (#5374)
dongxuy04 Jun 20, 2025
b3045c4
refactor: remove TrtGptModelOptionalParams (#5165)
Funatiq Jun 20, 2025
ebadc13
[doc] update mtp documents (#5387)
lfr-0531 Jun 21, 2025
58a8a8f
feature: unify new_tokens format sample state to trtllm sampler new_t…
netanel-haber Jun 23, 2025
e16c1be
[fix] Add 1 and draft_token_num to seq_len when overlap scheduling is…
HuiGao-NV Jun 24, 2025
4b32a3f
test: [CI] remove closed bugs (#5400)
xinhe-nv Jun 24, 2025
e2a8cbc
refactor: manage cache indirection in decoder state (#5315)
Funatiq Jun 24, 2025
658fb5b
tests: update benchmark test lists (#5365)
xinhe-nv Jun 24, 2025
d26040e
chore: delete mamba hybrid, since it is now called NemotronH (#5409)
vegaluisjose Jun 24, 2025
4752720
[Infra] - Waive failed tests in post-merge and increase some timeout …
EmmaQiaoCh Jun 24, 2025
35a92f6
Add debug hook to support dump tensor data and add new debug function…
HuiGao-NV Jun 24, 2025
d93a5e0
Chore: remove unused variables (#5314)
QiJune Jun 24, 2025
846bbf1
Fix test Pytorch model engine (#5416)
Tabrizian Jun 24, 2025
6995200
Add MTP support for Online EPLB (#5213)
dongxuy04 Jun 24, 2025
241f921
waive test_moe.py::test_moe_fp8[autotune] (#5455)
QiJune Jun 25, 2025
73ba4fc
fix: fix bug of qwen3 + eagle3 + finalize_moe_fusion (#5369)
byshiue Jun 25, 2025
5cffb7e
[AutoDeploy] Merge feat/ad_2025_06_13 feature branch (#5454)
lucaslie Jun 25, 2025
d535489
feat: Dynamically remove servers in PD (#5270)
Shunkangz Jun 25, 2025
da98e03
tests: Set kv cache free memory fraction in test case (#5433)
HuiGao-NV Jun 25, 2025
76da7fe
fix (NvBug 5354925): Fix static EPLB (#5411)
syuoni Jun 25, 2025
fc7a81c
test: Add LLGuidance test and refine guided decoding (#5348)
syuoni Jun 25, 2025
478f668
CI: update multi gpu test triggering file list (#5466)
QiJune Jun 25, 2025
3ca2f6a
start OAIServer with `max_beam_width=1` for TorchSampler (#5427)
netanel-haber Jun 25, 2025
f3cfe86
chore: bump version to 1.0.0rc1 (#5460)
yiqingy0 Jun 25, 2025
1f292ff
[https://jirasw.nvidia.com/browse/TRTLLM-4645] support mutliCtasKvMod…
PerkzZheng Jun 25, 2025
2901c5a
CI: waive test_ad_build_small_multi (#5471)
QiJune Jun 25, 2025
b3a4c1f
feat: Remove not used padding_idx in models (#5385)
HuiGao-NV Jun 25, 2025
d6ada5f
[nvbug/5354956] fix: unexpected keyword argument 'streaming' (#5436)
kaiyux Jun 25, 2025
cc3c2b3
Move 3 disaggregated cases from 4 GPUs devices to 1 GPU device (#5457)
HuiGao-NV Jun 25, 2025
314f15f
Fix: fix nvbug 5356427 (#5464)
HuiGao-NV Jun 25, 2025
c5ae327
feat: Make benchmark_serving part of the library (#5428)
kaiyux Jun 25, 2025
205c97a
[TRTLLM-5974][feat] Support disaggregated serving in TRTLLM Sampler (…
dcampora Jun 25, 2025
5bc8c89
[chore] Disable block reuse when draft model speculation is being use…
mikeiovine Jun 25, 2025
3a2c4ca
chore: split _build_model method for TorchLlm and TrtLlm (#5418)
QiJune Jun 25, 2025
61bb71f
[fix][test] remove test in global scope (#5470)
omera-nv Jun 25, 2025
bdc8dfe
[fix][ci] dont build wheel for cpp tests (#5443)
omera-nv Jun 25, 2025
feaf789
CI: reduce BF16 test cases in B200 (#5482)
QiJune Jun 25, 2025
1e4fa13
Add sleep function for disagg gen-only benchmarking (#5398)
qiaoxj07 Jun 25, 2025
74ae15a
CI: enable test cases on single device type (#5484)
HuiGao-NV Jun 26, 2025
3fc5754
[5356427] fix: Remove the seq_len of 4096 from FP8 block scale MoE tu…
hyukn Jun 26, 2025
578dbc8
feat: chunked prefill for MLA (Blackwell) (#4651)
jmydurant Jun 26, 2025
d135f59
Add unit test for routing kernels (#5405)
ChristinaZ Jun 26, 2025
d9b75f8
[CI] Waive `test_fp8_block_scales_4gpus[ep4-mtp_nextn=0-fp8kv=True-at…
venkywonka Jun 26, 2025
32d1573
[Infra] - Add timeout setting for long tests found in post-merge (#5501)
EmmaQiaoCh Jun 26, 2025
6aef149
Revert "feature: unify new_tokens format sample state to trtllm sampe…
netanel-haber Jun 26, 2025
e9cd810
keep sm90 headsize 128 cubins (#5320)
qsang-nv Jun 26, 2025
9428414
opensource: Opensource MOE MXFP8-MXFP4 implementation (#5222)
djns99 Jun 26, 2025
9ee3360
[TRTLLM-6019] feat: Remove cutlass min latency code from AutoTuner. (…
hyukn Jun 26, 2025
e0bb123
[TRTLLM-5921][feat] Prevent serialization of entire LoRA adapters in …
amitz-nv Jun 26, 2025
490d2e5
feat: large-scale EP(part 8: Online EP load balancer integration for …
dongxuy04 Jun 26, 2025
7e681fb
[chore] Allow configuring linking of NVRTC wrapper (#5189)
AlessioNetti Jun 26, 2025
1bab900
perf: Optimize swizzle_sf, unswizzle_sf, reswizzle_sf (#5318)
bobboli Jun 26, 2025
fa0ea92
[fix][ci] trigger multigpu tests for deepseek changes (#5423)
omera-nv Jun 26, 2025
ff2dd72
tests: waive tests (#5458)
xinhe-nv Jun 26, 2025
749393e
doc: Fix benchmark cmd in disagg scripts (#5515)
kaiyux Jun 26, 2025
0788c5d
[perf] improve XQA-MLA perf (#5468)
lowsfer Jun 26, 2025
2eb6502
feat: Add support for TRTLLM CustomDataset (#5511)
kaiyux Jun 26, 2025
3a1f4d4
[feat] Add progress bar to benchmark (#5173)
arekay-nv Jun 26, 2025
baf7eaa
Add trtllm-bench reviewers. (#5452)
FrankD412 Jun 26, 2025
1633bd2
[CI] move flashinfer llama tests to post merge (#5506)
omera-nv Jun 26, 2025
6bae76d
[fix][ci] move torch tests to run under torch stage (#5473)
omera-nv Jun 26, 2025
8dfa31c
refactor: remove batch_manager::KvCacheConfig and use executor::KvCac…
Funatiq Jun 26, 2025
8836990
[TRTLLM-3602][feat] support nvfp4 model and fp8 kv cache for MLA chun…
jmydurant Jun 26, 2025
de7cd0d
fix: MoE autotune fallback failed to query default heuristic (#5520)
rosenrodt Jun 26, 2025
69c4ef2
Update allow list 2025_06_26 (#5526)
yuanjingx87 Jun 26, 2025
0083228
fix: Mapping rank boundary check bug (#4935)
venkywonka Jun 26, 2025
aa6e015
Update trtllm-bench to support new Pytorch default. (#5491)
FrankD412 Jun 27, 2025
0f3bd78
[TRTLLM-4971]: Use safe deserialization in ParallelConfig (#4630)
yibinl-nvidia Jun 27, 2025
a3494be
tests: waive failed tests on main (#5512)
xinhe-nv Jun 27, 2025
dc36228
fix: Fix block scale fp8 support for deepseek v3 on Blackwell. (#5514)
yuxianq Jun 27, 2025
49af791
Add testing for trtllm-llmapi-launch with tritonserver (#5528)
Tabrizian Jun 27, 2025
ef43b95
Fix execute_process: check results using EQUAL (#5481)
yuantailing Jun 27, 2025
83a1f60
feat: Expose bias and FP8_MXFP4 MOE CUTLASS backend features to pytor…
djns99 Jun 27, 2025
980030c
[Infra] - Waive failed case in post-merge (#5536)
EmmaQiaoCh Jun 27, 2025
73b8a95
feat: Use inference mode in update_requests to improve perf of TRTLLM…
dcampora Jun 27, 2025
7f1893f
ci: waive flaky test test_llama_eagle3 (#5548)
syuoni Jun 27, 2025
a608b00
Fix mPtrExpertCounts allocation in MoE TRT-LLM backend (nvfp4) (#5519)
ChristinaZ Jun 27, 2025
6fc1c6f
[fix][ci] correct unittests test prefix (#5547)
omera-nv Jun 27, 2025
cb58073
Fix : fix build for sm120 (#5265)
peaceh-nv Jun 27, 2025
56cdfe5
[TRTLLM-5000][feat] NGrams V2 (#4569)
wili-65535 Jun 27, 2025
833c0de
[TRTLLM-6104] feat: add request_perf_metrics to LLMAPI (#5497)
achartier Jun 27, 2025
a8141a4
refactor: Speculative decoding buffers part 2 (#5316)
Funatiq Jun 27, 2025
5437075
ReDrafter support for Qwen (#4875)
darraghdog Jun 27, 2025
26b953e
[nvbugs/5309940] Add support for input output token counts (#5445)
Tabrizian Jun 27, 2025
5773cfd
feat: Add support for per expert activation scaling factors (#5013)
djns99 Jun 27, 2025
6021a43
Make moe permute and final as custom op (#5412)
limin2021 Jun 27, 2025
619709f
[AutoDeploy] merge feat/ad-2025-06-13 (#5556)
lucaslie Jun 28, 2025
9db769e
[Infra] - Add import pytest (#5565)
EmmaQiaoCh Jun 29, 2025
a985c0b
tests: Move stress tests to be Post-Merge only (#5166)
amirkl94 Jun 29, 2025
de97799
feat: Add support for YARN in NemotronNAS models (#4906)
amirkl94 Jun 29, 2025
70e34a3
[TRTLLM-5831][feat] Add LoRA support for pytorch backend in trtllm-se…
talorabr Jun 29, 2025
a1c1c6b
[CI] reduce mamba2 ssm test parameterization (#5571)
tomeras91 Jun 29, 2025
6000380
perf: Avoid reswizzle_sf after allgather. (#5504)
bobboli Jun 29, 2025
94dc97a
[feat][test] reuse MPI pool executor across tests (#5566)
omera-nv Jun 29, 2025
b4dab23
[TRTLLM-5965] perf: Optimize MoE sort kernels for large-scale EP (#5435)
syuoni Jun 29, 2025
64db7d2
[feat] Optimizations on weight-only batched gemv kernel (#5420)
Njuapp Jun 30, 2025
2780fc2
[ci] remove MMLU if followed by GSM8K (#5578)
omera-nv Jun 30, 2025
578430e
[TRTLLM-5530][BREAKING CHANGE]: enhance the llm args pytorch config p…
nv-guomingz Jun 30, 2025
4fef14d
Deduplicate waive list (#5546)
yiqingy0 Jun 30, 2025
1db63c2
[fix] speedup modeling unittests (#5579)
omera-nv Jun 30, 2025
852b790
feat : support duplicate_kv_weight for qwen3 blockwise scale (#5459)
dongjiyingdjy Jun 30, 2025
42a9385
[TRTLLM-5331] perf: Replace allgaher with AllToAllPrepare (#5570)
WeiHaocheng Jun 30, 2025
2ce200f
doc: Minor update to DeepSeek R1 best practice (#5600)
kaiyux Jun 30, 2025
6cbc9a5
[nvbug/5354946][fix] Fix mtp vanilla draft inputs (#5568)
lfr-0531 Jun 30, 2025
9bdc595
refactor: decoder state setup (#5093)
Funatiq Jun 30, 2025
b8a568d
[Infra][main] Cherry-pick from release/0.21: Update nccl to 2.27.5 (#…
EmmaQiaoCh Jun 30, 2025
38a3977
[TRTLLM-5989, TRTLLM-5991, TRTLLM-5993] doc: Update container instruc…
ixlmar Jun 30, 2025
42134b8
[ci] move eagle1 and medusa tests to post-merge (#5604)
omera-nv Jun 30, 2025
98a7c24
chore [TRTLLM-6009]: remove ptuning knobs from TorchLlmArgs (#5595)
Superjomn Jun 30, 2025
3b19634
[fix][ci] missing class names in post-merge test reports (#5603)
omera-nv Jun 30, 2025
16fc993
refactor: [TRTLLM-6150] Refactor moe permute and finalize op by remov…
limin2021 Jun 30, 2025
6e48ac2
chore: remove cuda_graph_ prefix from cuda_graph_config filed members…
nv-guomingz Jun 30, 2025
f28cd30
feat: AutoDeploy fp8 quantization support for bmm (#3849)
meenchen Jun 30, 2025
6ee94c7
Reintroduce with perf fixes: feature: unify new_tokens format sample …
netanel-haber Jun 30, 2025
7cf1209
[fix]: Fix main test skip issue (#5503)
yizhang-nv Jul 1, 2025
8caaf68
chores: [TRTLLM-6072] 1.0 LLMAPI doc updates (#5629)
hchings Jul 1, 2025
82547f7
add feature support matrix for PyTorch backend (#5037)
QiJune Jul 1, 2025
9b17b29
test: [CI] remove closed bugs (#5572)
xinhe-nv Jul 1, 2025
a8cf611
test: [CI] Add failed cases into waives.txt (#5569)
xinhe-nv Jul 1, 2025
7135b27
rcca: test default kv_cache_reuse option for pytorch multimodal (#5544)
StanleySun639 Jul 1, 2025
34212e2
[TRTLLM-6104] feat: add request_perf_metrics to triton LLMAPI backend…
xuanzic Jul 1, 2025
19c56f0
test: [CI] Add failed cases into waives.txt (#5582)
xinhe-nv Jul 1, 2025
7a617ad
feat: W4A16 GEMM (#4232)
danielafrimi Jul 1, 2025
5f77d21
test: Reduce number of C++ test cases (#5437)
Funatiq Jul 1, 2025
071ad75
[https://nvbugs/5318059][test] Unwaive test (#5624)
pamelap-nvidia Jul 1, 2025
65c2b93
[Infra] - Add some timeout and unwaive a test which dev fixed (#5631)
EmmaQiaoCh Jul 1, 2025
61c5a53
[#5403][perf] Conditionally enable SWAP AB for speculative decoding (…
zoheth Jul 1, 2025
a5eff13
[TRTLLM-5277] chore: refine llmapi examples for 1.0 (part1) (#5431)
Superjomn Jul 1, 2025
872610a
doc: cherry pick #5334 (#5368)
MartinMarciniszyn Jun 19, 2025
61213e3
tests: fix typos in qa test (#5421)
crazydemo Jun 25, 2025
4ef60d5
nvbugs-5331031; nvbugs-5344203 - address intermittent issues with Mis…
brb-nv Jun 25, 2025
3789ba1
feat: TRTLLM-5941 Upgrade xgrammar to 0.1.18 (#5364)
Wanli-Jiang Jun 25, 2025
ded203d
test: set enable_attention_dp=True in default deepseek settings (#5461)
ruodil Jun 25, 2025
be5ddb0
Fix permission for local user issues in NGC docker container. (#5373)
MartinMarciniszyn Jun 25, 2025
ee7fcbf
[nvbug 5273941] fix: broken cyclic reference detect (#5417)
Superjomn Jun 25, 2025
93edfea
[nvbug/5354825] Fix nougat test image url (#5496)
amukkara Jun 26, 2025
4b3f2db
fix: fix regression in LOCAL_USER (#5517)
ixlmar Jun 26, 2025
48eee33
fix: constrain grepping in docker/Makefile (#5493)
ixlmar Jun 26, 2025
178fc3f
[Infra][release/0.21] - waive failed tests (#5537)
EmmaQiaoCh Jun 27, 2025
3bc703d
ci: unwaive llmapi launch test (#5281)
Superjomn Jun 27, 2025
d68fa72
refactor: Clean up DecodingInput and DecodingOutput (#5617)
Funatiq Jul 1, 2025
f9a4556
perf: Use tokenizers API to optimize incremental detokenization perf …
kaiyux Jul 1, 2025
c345f58
[feat] Support torch compile for attention dp (#5086)
liji-nv Jul 1, 2025
fa95e40
feat: add LLmArgs option to force using dynamic quantization (#5346)
achartier Jul 1, 2025
1341ffd
[TRTLLM-5644][infra] Update the community action to more appropriate …
poweiw Jul 1, 2025
efef911
fix: add missing self. from PR #5346 (#5653)
achartier Jul 2, 2025
ba2ab50
[Bug] attention DP doesn't work with embedding TP (#5642)
PerkzZheng Jul 2, 2025
10c5051
fix: Add back allreduce_strategy parameter into TorchLlmArgs (#5637)
HuiGao-NV Jul 2, 2025
7992869
perf: better heuristic for allreduce (#5432)
yilin-void Jul 2, 2025
32dfdfb
feat: fuse w4a8 moe pre-quant scale on Hopper (#5613)
xiaoweiw-nv Jul 2, 2025
caf27ca
[chore] 2025-07-02 update github CI allowlist (#5661)
niukuo Jul 2, 2025
3e75320
Add pd dynamic scaling readme (#5540)
Shunkangz Jul 2, 2025
2d69b55
chore: enhance yaml loading arbitrary options in LlmArgs (#5610)
Superjomn Jul 2, 2025
ca7b6ec
Feat/pytorch vswa kvcachemanager (#5151)
qixiang-99 Jul 2, 2025
4cd8543
[TRTLLM-1316] refactor: Remove unnecessary pipeline parallelism logic…
Funatiq Jul 2, 2025
77082cd
[https://nvbugspro.nvidia.com/bug/5329655] [feat] Pytorch path add sp…
jhaotingc Jul 2, 2025
31699cb
[Infra] - Set default timeout to 1hr and remove some specific setting…
EmmaQiaoCh Jul 2, 2025
04fa6c0
[TRTLLM-6143] feat: Improve dev container tagging (#5551)
ixlmar Jul 2, 2025
afef512
feat:[AutoDeploy] E2E build example for llama4 VLM (#3922)
Fridah-nv Jul 2, 2025
3a46cf2
fix: Fix missing arg to alltoall_prepare_maybe_dispatch (#5669)
syuoni Jul 3, 2025
2a5fdeb
[Infra] - Waive failed tests for main 0702 (#5671)
EmmaQiaoCh Jul 3, 2025
3c9dd5c
chore: bump version to 1.0.0rc2 (#5645)
yiqingy0 Jul 3, 2025
7dbecf7
[TRTLLM-4923][feat] Enable CUDA graphs for Nemotron-H (#5646)
tomeras91 Jul 3, 2025
de0b522
[Infra] - Fix test stage check for the package sanity check stage (#5…
yiqingy0 Jul 3, 2025
5308973
[Infra] - Waive a failed case on main (#5702)
EmmaQiaoCh Jul 3, 2025
dccbfc8
fix: Set init value for moe expert id (#5660)
WeiHaocheng Jul 3, 2025
c728561
[ci] small multigpu speedups (#5643)
omera-nv Jul 3, 2025
f91379b
delete duplicate eagle3 and ngram tests (#5711)
netanel-haber Jul 3, 2025
1a3bd14
chore: Remove unused isFullContextRequest method (#5666)
Funatiq Jul 3, 2025
8dad22c
chore: refine the default value by using pydantic default instead of …
nv-guomingz Jul 3, 2025
2b0c87e
[ModelLoad] Concurrent load model (#5291)
arekay Jul 3, 2025
528ff52
[https://nvbugs/5365714] fix(scaffolding): use default LLM rather tha…
dc3671 Jul 3, 2025
0566fa1
[None][infra] Update the auto-community label action to be triggered …
poweiw Jul 3, 2025
aa72d39
MTP and derivatives: Align sample state with trtllm sampler sample st…
netanel-haber Jul 3, 2025
24ac9b5
[AutoDeploy] merge feat/ad-2025-06-29 (#5737)
lucaslie Jul 4, 2025
7a31952
feat: support more parameters in openai worker of scaffolding (#5115)
ccs96307 Jul 4, 2025
4762e0b
Waive tests : test_openai_lora, test_trtllm_serve_lora_example and te…
venkywonka Jul 4, 2025
7f837b6
tests: waive failures on main (#5704)
xinhe-nv Jul 4, 2025
77288d3
fix [nvbug5351244]: test_mpi_session submit sync/async (#5608)
Superjomn Jun 30, 2025
d0b3d2a
fix:https://nvbugs/5362398 (#5609)
nv-guomingz Jun 30, 2025
cb9f596
[nvbug 5300551] test: increase block count in eviction test (#5465)
zhengd-nv Jul 1, 2025
73d30a2
test: add more tests for GB200 with 8 GPUs/2 nodes in L0 tests (#5397)
yizhang-nv Jul 1, 2025
ab488a5
doc: Fix outdated config in DeepSeek best perf practice doc (#5638)
kaiyux Jul 1, 2025
819ae90
[https://nvbugspro.nvidia.com/bug/5351333][fix] Update to chunking ca…
FrankD412 Jul 2, 2025
cdaa6ab
fix: Investigate Gemma3 1B decoder output discrepancy (#5564)
brb-nv Jul 3, 2025
a0135c0
[Infra] - Waive failed cases on release/0.21 (#5674)
EmmaQiaoCh Jul 3, 2025
94f0252
Doc: Update invalid hugging face URLs (#5683)
Linda-Stadter Jul 3, 2025
134b238
[fix: nvbugs/5355493] Correctly clamp max sequence len to max attenti…
netanel-haber Jul 4, 2025
a79d8c9
Fix none response in PD (#5422)
Shunkangz Jul 4, 2025
32b244a
feat: reduce unnecessary kernel generation (#5476)
tongyuantongyu Jul 4, 2025
c434147
chore: update doc by replacing use_cuda_graph with cuda_graph_config …
nv-guomingz Jul 4, 2025
e134a52
Perf: reduce DeepEPLowLatency memory and time (#5712)
yuantailing Jul 4, 2025
b8fef80
[Infra] - Waive L0 test (#5748)
yiqingy0 Jul 4, 2025
07f9cf1
fix: Improve chunking test and skip empty kernel calls (#5710)
Funatiq Jul 4, 2025
81c0764
Cherry pick "[NVBUG:5355009] Modify check for fuse_fp4_quant on SM120…
farazkh80 Jul 4, 2025
3869b96
test: [CI] Add failed cases into waives.txt (#5718)
xinhe-nv Jul 4, 2025
471bf0b
fix: check file exists in dev container script (#5755)
ixlmar Jul 4, 2025
32339d1
Raise shut down error for each request (#4936)
Shunkangz Jul 4, 2025
7f3ea05
[Infra] - Waive L0 flaky test (#5759)
yiqingy0 Jul 4, 2025
3ed3bbc
Fix: pass allreduce strategy to pytorchConfig (#5746)
HuiGao-NV Jul 4, 2025
ffc0b8f
Cache transceiver support VSWA (#5505)
chuangz0 Jul 4, 2025
d1112aa
[TRTLLM-3442] feat: added beam search support to the PyTorch Workflow…
stnie Jul 4, 2025
d61893d
[fix] Update to properly set cuda graphs in trtllm-bench overrides. (…
FrankD412 Jul 4, 2025
1b588f8
feat: KV events for sliding window attention (#5580)
jthomson04 Jul 4, 2025
089fd55
Add dummy all_reduce for kernel breakdown (#5745)
qiaoxj07 Jul 5, 2025
b1976c2
Add wide-ep benchmarking scripts (#5760)
qiaoxj07 Jul 5, 2025
6bddaf6
chore: Improve documentation of Kv_block_array (#5765)
hypdeb Jul 5, 2025
d95ae13
[Infra] - Always use x86 image for the Jenkins agent and few clean-up…
chzblych Jul 6, 2025
ae27261
refactor: decoding inputs (#5679)
Funatiq Jul 6, 2025
2013034
[Test] - Waive or fix few known test failures (#5769)
chzblych Jul 6, 2025
66f299a
[TRTLLM-5878] add stage for image registration to nspect (#5699)
niukuo Jul 6, 2025
ec6c7df
feat: Add support for MXFP8xMXFP4 in pytorch (#5535)
djns99 Jul 6, 2025
85e934a
[Doc] update the document of qwen3 and cuda_graph usage (#5703)
byshiue Jul 7, 2025
092e0eb
[Infra] - Fix a syntax issue in the image check (#5775)
chzblych Jul 7, 2025
de10774
chore: log stack trace on error in openai server (#5749)
zhengd-nv Jul 7, 2025
9db2e9e
fix: [nvbug/5368507] Fix test_generate_with_seed CI failure. (#5772)
bobboli Jul 7, 2025
12d8c7d
Refactor the topk parallelization part for the routing kernels (#5567)
ChristinaZ Jul 7, 2025
ded38eb
test: [CI] remove closed bugs (#5770)
xinhe-nv Jul 7, 2025
dfce61f
[TRTLLM-5530][BREAKING CHANGE] refactor: LLM arglist rename mixed_sam…
Superjomn Jul 7, 2025
ed1b3c8
fix: Adjust free GPU memory fraction in KvCacheConfig for DeepSeek R1…
yizhang-nv Jul 7, 2025
5ca2b9b
[TRTLLM-5812][feat] support FP8 row-wise dense GEMM in torch flow (#5…
DylanChen-NV Jul 7, 2025
1260e2f
feat: Optimize TRTLLM Sampler perf single beam single step (#5550)
dcampora Jul 7, 2025
85b4a68
Refactor: move DeepEP from Docker images to wheel building (#5534)
yuantailing Jul 7, 2025
30a19fc
[TRTLLM-6291] feat: Add user-provided speculative decoding support (#…
Funatiq Jul 7, 2025
1191555
[ci] speedup fused moe tests (#5726)
omera-nv Jul 7, 2025
a1235ee
[feat] Adds optional module cache for TRT-LLM Gen Gemm interfaces (#5…
davidclark-nv Jul 7, 2025
5a8173c
chore: [Breaking Change] Rename cuda_graph_config padding_enabled fie…
nv-guomingz Jul 8, 2025
5bc3a15
feat: add MultimodalParams & putting all multimodal params into it an…
yechank-nvidia Jul 8, 2025
0be41b6
Revert "chore: [Breaking Change] Rename cuda_graph_config padding_ena…
nv-guomingz Jul 8, 2025
95978e3
[fix] https://nvbugs/5333654 Unwaive to check ci status and improve t…
liji-nv Jul 8, 2025
664bf95
[fix] improve fp4_block_scale_moe_runner type check (#5681)
Alcanderian Jul 8, 2025
b2da16b
feat(scaffolding): add streaming scaffolding_llm.generate_async support
dc3671 Jun 17, 2025
136fa19
feat(scaffolding): yield two tasks in dynasor to eliminate waiting
dc3671 Jun 25, 2025
c033d5b
fix dynasor example print
dc3671 Jun 26, 2025
de9cfba
fix
dc3671 Jul 7, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
The diff you're trying to view is too large. We only load the first 3000 changed files.
10 changes: 10 additions & 0 deletions .devcontainer/devcontainer.env
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Environment variables used to configure the Dev Container setup.
#
# The syntax needs to be compatible with
# https://docs.docker.com/compose/how-tos/environment-variables/variable-interpolation/#env-file-syntax
#
# Edit this file as necessary. For local changes not to be committed back
# to the repository, create/edit devcontainer.env.user instead.
HF_HOME_DEFAULT="${HOME}/.cache/huggingface"
HF_HOME_XDG_DEFAULT="${XDG_CACHE_HOME:-${HF_HOME_DEFAULT}}"
LOCAL_HF_HOME="${HF_HOME:-${HF_HOME_XDG_DEFAULT}}"
16 changes: 6 additions & 10 deletions .devcontainer/devcontainer.json
Original file line number Diff line number Diff line change
Expand Up @@ -3,24 +3,18 @@
{
"name": "TRT-LLM Devcontainer",
"dockerComposeFile": [
"docker-compose.yml"
"docker-compose.yml",
"docker-compose.override.yml"
],
"service": "tensorrt_llm-dev",
"remoteUser": "ubuntu",
"containerEnv": {
// "CCACHE_DIR" : "/home/coder/${localWorkspaceFolderBasename}/cpp/.ccache",
// "CCACHE_BASEDIR" : "/home/coder/${localWorkspaceFolderBasename}",
"HF_TOKEN": "${localEnv:HF_TOKEN}",
"HF_HOME": "/huggingface",
"HISTFILE": "${containerWorkspaceFolder}/.cache/._bash_history"
},
"workspaceFolder": "/workspaces/tensorrt_llm",
// "workspaceFolder": "/home/coder/${localWorkspaceFolderBasename}",
// "workspaceMount": "source=${localWorkspaceFolder},target=/home/coder/${localWorkspaceFolderBasename},type=bind,consistency=consistent",
"mounts": [
"source=${localEnv:HOME}/.cache/huggingface,target=/huggingface,type=bind", // HF cache
"source=/home/scratch.trt_llm_data/,target=/home/scratch.trt_llm_data/,type=bind,consistency=consistent"
],
"initializeCommand": "cd ${localWorkspaceFolder} && ./.devcontainer/make_env.py",
// Note: sourcing .profile is required since we use a local user and the python interpreter is
// global (/usr/bin/python). In this case, pip will default to a local user path which is not
// by default in the PATH. In interactive devcontainer shells, .profile is sourced by default.
Expand All @@ -43,7 +37,9 @@
// "ms-vscode.cmake-tools",
// Git & Github
// "GitHub.vscode-pull-request-github"
"eamodio.gitlens"
"eamodio.gitlens",
// Docs
"ms-vscode.live-server"
],
"settings": {
"C_Cpp.intelliSenseEngine": "disabled",
Expand Down
8 changes: 8 additions & 0 deletions .devcontainer/docker-compose.override-example.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Example .devcontainer/docker-compose.override.yml
version: "3.9"
services:
tensorrt_llm-dev:
volumes:
# Uncomment the following lines to enable
# # Mount TRTLLM data volume:
# - /home/scratch.trt_llm_data/:/home/scratch.trt_llm_data/:ro
5 changes: 3 additions & 2 deletions .devcontainer/docker-compose.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
version: "3.9"
services:
tensorrt_llm-dev:
image: urm.nvidia.com/sw-tensorrt-docker/tensorrt-llm:pytorch-25.05-py3-x86_64-ubuntu24.04-trt10.11.0.33-skip-tritondevel-202506051650-4885
image: ${DEV_CONTAINER_IMAGE}
network_mode: host
ipc: host

Expand All @@ -22,7 +22,8 @@ services:
capabilities: [gpu]

volumes:
- ..:/workspaces/tensorrt_llm:cached
- ${SOURCE_DIR}:/workspaces/tensorrt_llm
- ${LOCAL_HF_HOME}:/huggingface # HF cache

environment:
- CCACHE_DIR=/workspaces/tensorrt_llm/cpp/.ccache
Expand Down
221 changes: 221 additions & 0 deletions .devcontainer/make_env.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,221 @@
#!/usr/bin/env python3

import json
import logging
import os
import re
import shlex
import subprocess
import sys
from pathlib import Path
from tempfile import TemporaryDirectory
from typing import Dict, List, Optional

JENKINS_PROPS_PATH = Path("jenkins/current_image_tags.properties")
DEV_CONTAINER_ENV_PATH = Path(".devcontainer/devcontainer.env")
DEV_CONTAINER_USER_ENV_PATH = Path(".devcontainer/devcontainer.env.user")
DOT_ENV_PATH = Path(".devcontainer/.env")
COMPOSE_OVERRIDE_PATH = Path(".devcontainer/docker-compose.override.yml")
COMPOSE_OVERRIDE_EXAMPLE_PATH = Path(
".devcontainer/docker-compose.override-example.yml")

HOME_DIR_VAR = "HOME_DIR"
SOURCE_DIR_VAR = "SOURCE_DIR"
DEV_CONTAINER_IMAGE_VAR = "DEV_CONTAINER_IMAGE"
BUILD_LOCAL_VAR = "BUILD_LOCAL"
JENKINS_IMAGE_VAR = "LLM_DOCKER_IMAGE"
LOCAL_HF_HOME_VAR = "LOCAL_HF_HOME"

LOGGER = logging.getLogger("make_env")


def _load_env(env_files: List[Path]) -> Dict[str, str]:
"""Evaluate files using 'sh' and return resulting environment."""
with TemporaryDirectory("trtllm_make_env") as temp_dir:
json_path = Path(temp_dir) / 'env.json'
subprocess.run(
("(echo set -a && cat " +
" ".join(shlex.quote(str(env_file)) for env_file in env_files) +
" && echo && echo exec /usr/bin/env python3 -c \"'import json; import os; print(json.dumps(dict(os.environ)))'\""
+ f") | sh > {json_path}"),
shell=True,
check=True,
)
with open(json_path, "r") as f:
env = json.load(f)
return env


def _detect_rootless() -> bool:
proc = subprocess.run("./docker/detect_rootless.sh",
capture_output=True,
check=True,
shell=True)
return bool(int(proc.stdout.decode("utf-8").strip()))


def _handle_rootless(env_inout: Dict[str, str]):
is_rootless = _detect_rootless()
if is_rootless:
LOGGER.info("Docker Rootless Mode detected.")
if HOME_DIR_VAR not in env_inout:
raise ValueError(
"Docker Rootless Mode requires setting HOME_DIR in devcontainer.env.user"
)
if SOURCE_DIR_VAR not in env_inout:
raise ValueError(
"Docker Rootless Mode requires setting SOURCE_DIR in devcontainer.env.user"
)

# Handle HF_HOME
if "HF_HOME" in os.environ and "HF_HOME" in env_inout:
raise ValueError(
"Docker Rootless Mode requires either not setting HF_HOME at all or overriding it in devcontainer.env.user"
)
if env_inout[LOCAL_HF_HOME_VAR].startswith(env_inout["HOME"]):
env_inout[LOCAL_HF_HOME_VAR] = env_inout[LOCAL_HF_HOME_VAR].replace(
env_inout["HOME"], env_inout[HOME_DIR_VAR], 1)
else:
env_inout[HOME_DIR_VAR] = env_inout["HOME"]
env_inout[SOURCE_DIR_VAR] = os.getcwd()


def _select_prebuilt_image(env: Dict[str, str]) -> Optional[str]:
# Jenkins image
candidate_images: List[str] = [env[JENKINS_IMAGE_VAR]]

# NGC images
proc = subprocess.run(
r"git tag --sort=creatordate --merged=HEAD | grep -E '^v[0-9]+\.[0-9]+\.[0-9]+' | sed -E 's/^v(.*)$/\1/' | tac",
shell=True,
capture_output=True,
check=True,
)
for git_tag in proc.stdout.splitlines():
git_tag = git_tag.strip()
candidate_images.append(f"nvcr.io/nvidia/tensorrt-llm/devel:{git_tag}")

# Check image availability
for candidate_image in candidate_images:
LOGGER.info(f"Trying image {candidate_image}")

try:
subprocess.run(
f"docker run --rm -it --pull=missing --entrypoint=/bin/true {shlex.quote(candidate_image)}",
check=True,
shell=True)
except subprocess.CalledProcessError:
continue

LOGGER.info(f"Using image {candidate_image}")
return candidate_image

LOGGER.info("No pre-built image found!")
return None


def _build_local_image() -> str:
LOGGER.info("Building container image locally")

with TemporaryDirectory("trtllm_make_env") as temp_dir:
log_path = Path(temp_dir) / "build.log"
subprocess.run(
f"make -C docker devel_build | tee {shlex.quote(str(log_path))}",
check=True,
shell=True,
)
with open(log_path) as f:
build_log = f.read()

# Handle escaped and actual line breaks
build_log_lines = re.sub(r"\\\n", " ", build_log).splitlines()
for build_log_line in build_log_lines:
tokens = shlex.split(build_log_line)
if tokens[:3] != ["docker", "buildx", "build"]:
continue
token = None
while tokens and not (token := tokens.pop(0)).startswith("--tag"):
pass
if token is None:
continue
if token.startswith("--arg="):
token = token.removeprefix("--arg=")
else:
if not tokens:
continue
token = tokens.pop(0)
return token # this is the image URI
raise RuntimeError(
f"Could not parse --tag argument from build log: {build_log}")


def _ensure_compose_override():
if not COMPOSE_OVERRIDE_PATH.exists():
LOGGER.info(
f"Creating initial {COMPOSE_OVERRIDE_PATH} from {COMPOSE_OVERRIDE_EXAMPLE_PATH}"
)
COMPOSE_OVERRIDE_PATH.write_bytes(
COMPOSE_OVERRIDE_EXAMPLE_PATH.read_bytes())


def _update_dot_env(env: Dict[str, str]):
LOGGER.info(f"Updating {DOT_ENV_PATH}")

output_lines = [
"# NOTE: This file is generated by make_env.py, modify devcontainer.env.user instead of this file.\n",
"\n",
]

for env_key, env_value in env.items():
if os.environ.get(env_key) == env_value:
# Only storing differences w.r.t. base env
continue
output_lines.append(f"{env_key}=\"{shlex.quote(env_value)}\"\n")

with open(DOT_ENV_PATH, "w") as f:
f.writelines(output_lines)


def main():
env_files = [
JENKINS_PROPS_PATH,
DEV_CONTAINER_ENV_PATH,
]

if DEV_CONTAINER_USER_ENV_PATH.exists():
env_files.append(DEV_CONTAINER_USER_ENV_PATH)

env = _load_env(env_files)
_handle_rootless(env_inout=env)

# Determine container image to use
image_uri = env.get(DEV_CONTAINER_IMAGE_VAR)
if image_uri:
LOGGER.info(f"Using user-provided container image: {image_uri}")
else:
build_local = bool(int(
env[BUILD_LOCAL_VAR].strip())) if BUILD_LOCAL_VAR in env else None
image_uri = None
if not build_local:
image_uri = _select_prebuilt_image(env)
if image_uri is None:
if build_local is False:
raise RuntimeError(
"No suitable container image found and local build disabled."
)
image_uri = _build_local_image()
LOGGER.info(f"Using locally built container image: {image_uri}")
env[DEV_CONTAINER_IMAGE_VAR] = image_uri

_ensure_compose_override()

_update_dot_env(env)


if __name__ == "__main__":
logging.basicConfig(level=logging.INFO)
try:
main()
except Exception as e:
LOGGER.error(f"{e.__class__.__name__}: {e}")
sys.exit(-1)
3 changes: 2 additions & 1 deletion .gitattributes
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
*.a filter=lfs diff=lfs merge=lfs -text
*.dll filter=lfs diff=lfs merge=lfs -text
*.lib filter=lfs diff=lfs merge=lfs -text
*.so filter=lfs diff=lfs merge=lfs -text
*.dll filter=lfs diff=lfs merge=lfs -text
*.txz filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
triton_backend/tools/gpt/input_data.json filter=lfs diff=lfs merge=lfs -text
*cubin.cpp filter=lfs diff=lfs merge=lfs -text
Expand Down
6 changes: 6 additions & 0 deletions .github/CODEOWNERS
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,12 @@
/tensorrt_llm/_torch/auto_deploy @NVIDIA/trt-llm-torch-autodeploy-devs
/tensorrt_llm/examples/auto_deploy @NVIDIA/trt-llm-torch-autodeploy-devs

## TensorRT-LLM trtllm-bench Reviewers
/tensorrt_llm/bench @NVIDIA/trtllm-bench-reviewers
/tensorrt_llm/commands/bench.py @NVIDIA/trtllm-bench-reviewers
docs/source/performance/perf-benchmarking.md @NVIDIA/trtllm-bench-reviewers


# The rule below requires that any PR modifying public APIs must be approved by at least one member
# of the NVIDIA/trt-llm-committed-api-review-committee or NVIDIA/trt-llm-noncommitted-api-review-committee team.
# This approval is mandatory regardless of other approvals the PR may have received. Without approval
Expand Down
Loading