Skip to content
Closed
Changes from all commits
Commits
Show all changes
556 commits
Select commit Hold shift + click to select a range
194a708
[fix] Fix test_attention_mla (#5084)
jinyangyuan-nvidia Jun 10, 2025
6cb2b7d
CI: Allow run (#5101)
IzzyPutterman Jun 10, 2025
fcd7192
[fix] Unwaive test_llama_eagle3 (#5042)
mikeiovine Jun 10, 2025
1b79041
fix: XQA is not enabled when history_length < kMinHistoryTokensPerBlo…
bobboli Jun 11, 2025
580a925
test: conditional disagg and cache aware balancing for deepseek v3 (#…
zhengd-nv Jun 11, 2025
273c6b9
[https://nvbugspro.nvidia.com/bug/5332927][fix] Fix the bug in the ro…
ChristinaZ Jun 11, 2025
035b048
infra: Add timeout and retry for wget in docker image build (#5035)
ZhanruiSunCh Jun 11, 2025
0a9f105
Waive L0 tests (#5111)
yiqingy0 Jun 11, 2025
00991d1
chore: Merge remaining changes from feat/large-ep branch to main (#5039)
syuoni Jun 11, 2025
fdf1c47
[TRTLLM-4995][feat] TRTLLM Sampler log probs support (#4836)
dcampora Jun 11, 2025
e2863a3
chore: bump version to 0.21.0rc2 (#5112)
ZhanruiSunCh Jun 11, 2025
56abae0
test: add more llama_v3.3_70b cases in perf test (#4979)
ruodil Jun 11, 2025
8282d6c
[fix] Fix llama4 min latency (#5117)
liji-nv Jun 11, 2025
a90dd57
[TRTLLM-5082] - Add a bot run option for detailed logs (#4390)
yiqingy0 Jun 11, 2025
11b94fe
test: skip disaggregated tests on arm (#5070)
xinhe-nv Jun 11, 2025
ddfe4fc
[chore] 2025-06-10 update allowlist (#5102)
tburt-nv Jun 11, 2025
ad99a08
[TRTLLM-5581][infra] Update Module Owners (#5052)
poweiw Jun 12, 2025
ee44fa0
chore: rename IOFormatter to BaseCacheFormatter (#5068)
zhengd-nv Jun 12, 2025
c592798
fix: limit process pool size when prefetching (#5088)
zhengd-nv Jun 12, 2025
4319237
Use backend to replace macro to control enablement of MNNVL all reduc…
HuiGao-NV Jun 12, 2025
e692779
Solve underallocation in VSWA+/VGQA (#4667)
netanel-haber Jun 12, 2025
49d7268
[nvbugs/5331013] fix AutoDeploy for PyTorch 25.05 dependency upgrade …
lucaslie Jun 12, 2025
c3b2eb6
test(perf): Add remaining Llama-Nemotron perftests (nano, super, ultr…
venkywonka Jun 12, 2025
0daa709
Fix Llama-3_3-Nemotron-Super-49B-v1 FP8 accuracy threshold configs (#…
moraxu Jun 12, 2025
505678a
update the free_gpu_mem_fraction for H100 qwen3 qa test (#5114)
byshiue Jun 12, 2025
06d9f1e
[test] Use LLM API for Nemotron-H correctness test (#5097)
tomeras91 Jun 12, 2025
d021cc5
test: set enable_attention_dp to False for non-deepseek models and ad…
ruodil Jun 12, 2025
53983ad
[TRTLLM-4932] Add Llama-3.1-Nemotron-Nano-8B-v1-FP8 accuracy tests (#…
moraxu Jun 12, 2025
e462677
Fix logprobs issues. (#5136)
dcampora Jun 12, 2025
4d070d3
chore: fix typo in tests (#5092)
lfr-0531 Jun 12, 2025
10ab979
[fix] Do not reuse dummy request KVCache (#4804)
liji-nv Jun 12, 2025
a97f458
infra: upload imageTag info to artifactory and add ngc_staging to sav…
ZhanruiSunCh Jun 12, 2025
b563696
doc:fix invalid links for trtllm-serve doc (#5145)
nv-guomingz Jun 12, 2025
88cba5f
test: waive the NIXL related tests (#5153)
Shixiaowei02 Jun 12, 2025
59c9588
enh(doc): Add `ci-overview` in `docs/source/reference/` (#5137)
venkywonka Jun 12, 2025
22281cf
doc: Added documentation for enable_trtllm_sampler. (#4990)
dcampora Jun 12, 2025
58d4ca2
fix:remove duplicated trust_remote_code knob from trtllm-serve (#5143)
nv-guomingz Jun 12, 2025
cf35a07
fix:https://nvbugs/5298661 (#5022)
nv-guomingz Jun 12, 2025
8cfb567
fix: Updates to yarn implementation (#5105)
brb-nv Jun 12, 2025
dfeeaf6
Move allreduce_strategy from committed api to reference (#5147)
HuiGao-NV Jun 12, 2025
690873b
[nvbug/5334370][fix] Fix one model EAGLE3 (#5134)
mikeiovine Jun 12, 2025
655bce0
[fix][test] report individual unittests results to jenkins (#5116)
omera-nv Jun 12, 2025
3a04c9f
chore: Include prompt_token_ids only for context-only disagg requests…
pcastonguay Jun 12, 2025
cc2a134
None: fix OOM because of unnecessary mha workspace (#5056)
ttyio Jun 12, 2025
a0b6c63
[feat] trtllmGen MoE routing: added support for top groups and top K …
MatthiasKohl Jun 12, 2025
38a907a
[TRTLLM-5278][feat] Add attention dp support to MTP relaxed acceptanc…
lfr-0531 Jun 13, 2025
4ae46b6
fix: [nvbugs/5324229] Fix broken WInt4AFP8FusedMoEMethod since FusedM…
yuxianq Jun 13, 2025
a891013
[feat] Optimize KV Cache Reuse for MLA (#4869)
zhhuang-nv Jun 13, 2025
fa582cb
test: add more cases for rtx_pro_6000_se and add option kv_cache_dtyp…
ruodil Jun 13, 2025
d9be419
tests: update tests for b200 (#5180)
xinhe-nv Jun 13, 2025
b79eb34
[fix]: Fall back to HMAC to Avoid IPC Serialization Churn (#5074)
yibinl-nvidia Jun 13, 2025
dec326b
[fix] Reenable test return logits (#5160)
dcampora Jun 13, 2025
01bd4c0
Add two MTP disaggregated test (#4546)
Tabrizian Jun 13, 2025
28cd536
[test] Update timeout params in QA test list (#5124)
crazydemo Jun 13, 2025
4d0a5ad
chore: gracefully exit disagg process in tests; better startup and lo…
zhengd-nv Jun 13, 2025
514baf1
[fix] Fix comment to pass guardwords check (#5191)
MatthiasKohl Jun 13, 2025
12e075e
[nvbug 5333996 ][fix] Unload XQA cubins early to avoid static lifetim…
lowsfer Jun 13, 2025
30c5b41
refactoring: port customized kernels with public cutlass version (#5027)
yunruis Jun 13, 2025
b959618
refactor [BREAKING CHANGE]:: remove the redundant use_kv_cache field …
nv-guomingz Jun 13, 2025
30d9d0f
test: [CI] Add failed cases into waives.txt (#5178)
xinhe-nv Jun 13, 2025
089be89
feat: Basic skeleton for Gemma3 VLM (#5108)
brb-nv Jun 13, 2025
e96d686
add doc for open-sourced cutlass kernels (#5194)
yunruis Jun 13, 2025
e5be3a9
fix: fix license bug (#5200)
yunruis Jun 13, 2025
8e99370
ucxx only use ucp_feature_tag to aviod some issuse on some platform (…
chuangz0 Jun 13, 2025
952f33d
CI: move all test cases of TensorRT backend into post merge (#5186)
QiJune Jun 13, 2025
3d87770
[https://nvbugspro.nvidia.com/bug/5295470] support headDim 256 for bl…
PerkzZheng Jun 13, 2025
25aa388
[nvbug/5319281][fix] Stop drafting when we hit the draft model's max …
mikeiovine Jun 13, 2025
06342ff
[feat] Implement model-agnostic one-engine eagle3 (#4778)
nv-yilinf Jun 13, 2025
5f2785f
fix: Fix waive list (#5205)
syuoni Jun 13, 2025
82e280f
feat: add multi-node support for Triton with pytorch backend (#5172)
achartier Jun 13, 2025
97657bf
optimize memset before alltoall communication (#5188)
dongxuy04 Jun 14, 2025
3b7b5a5
refactor [BREAKING CHANGE]: enhance the llm args pytorch config part …
nv-guomingz Jun 14, 2025
b99c5ce
Feat/ds r1 min latency opt round3, add router gemm, fused a gemm, PDL…
yunruis Jun 14, 2025
443b2eb
refactor: Speculative decoding buffers (#5091)
Funatiq Jun 14, 2025
0b60da2
feat: large-scale EP(part 7: DeepEP integration) (#4792)
yuantailing Jun 14, 2025
dc52b67
linting(python): Enable ruff on more files (wave 1/N) (#5140)
2ez4bz Jun 14, 2025
1389f5a
feat: Add support for fp8 rowwise quantization (#4876)
achartier Jun 14, 2025
e055af1
chore: improve disagg test failure detection (#4738)
ixlmar Jun 14, 2025
6bce733
perf: avoid dynamic import overhead in is_llm_response with duck typi…
tongyuantongyu Jun 14, 2025
63bc62d
feat: Enable EPLB to existing MoE models (#5203)
syuoni Jun 15, 2025
dce1dcc
feat: Support post_proc for bench (#5122)
kaiyux Jun 15, 2025
159ffc5
fix: fix cuda graph max batch size for spec decoding cases. (#5076)
lfr-0531 Jun 15, 2025
4eade3a
[fix][test] Speedup Nemotron NAS unittests (#5202)
omera-nv Jun 15, 2025
5a01ba5
use cu for fmha_v2 (#4694)
qsang-nv Jun 15, 2025
39bba63
[TRTLLM-4983] feat: enable overlap scheduler between draft forwards (…
lfr-0531 Jun 15, 2025
109c426
Enable trtllm-bench to run LoRA and add basic e2e perf testing capabi…
amitz-nv Jun 15, 2025
c84e41f
fix: build_config in TorchLlmArgs and avoid arbitrary args (#4972)
Superjomn Jun 16, 2025
7a5e0fd
[fix] Fix Llama4 min-latency import error (#5209)
nv-yilinf Jun 16, 2025
babdd9c
test: Add json_mode_eval for guided decoding evaluation (#5179)
syuoni Jun 16, 2025
3d22f27
test: add more cases for llama_v3.3/3.1 70b fp8 and set enable_attent…
ruodil Jun 16, 2025
2848e01
test: add llama4 models for perf test (#5187)
ruodil Jun 16, 2025
9b616db
test: Add fixture to skip tests based on MPI world size (#5028)
yizhang-nv Jun 16, 2025
ef3fdc8
feat: Add w4a8_mxfp4_fp8 quantization recipe. (#4867)
Tracin Jun 16, 2025
0acf231
[Stress test] Add DeepSeek-R1 stress test (#5033)
Wanli-Jiang Jun 16, 2025
dda6416
refactor: Scheduling based on KV cache state (#4865)
Funatiq Jun 16, 2025
1d2b0d3
use file lock to avoid port conflict (#5123)
chuangz0 Jun 16, 2025
4f9fa9f
feat: MoE trtllm backend kernel update (#5183)
rosenrodt Jun 16, 2025
b6ca677
refactor: remove decoder request from decoder interface (#5129)
Funatiq Jun 16, 2025
8445416
Waive L0 tests (#5233)
yiqingy0 Jun 16, 2025
802f22c
test: [CI] Add failed cases into waives.txt (#5221)
xinhe-nv Jun 16, 2025
64b7f04
[test] split nemotron test cases from examples_test_list (#5238)
crazydemo Jun 16, 2025
03f1a6a
Update DeepSeek R1 perf numbers to latest release/0.20 results (#5235)
litaotju Jun 16, 2025
dd29063
[feat] Add llm args to tune python gc threshold (#5141)
nv-yilinf Jun 16, 2025
cea5dd1
[TRTLLM-5835][feat] Optimized Mamba2Mixer prefill (#5128)
tomeras91 Jun 16, 2025
e607768
Speculation: Draft Target in new FW (#4558)
IzzyPutterman Jun 16, 2025
5c18160
chore: Waive CI failure. (#5252)
SimengLiu-nv Jun 16, 2025
c53bc19
[infra] Make test_chunked_prefill faster (#5248)
mikeiovine Jun 16, 2025
a2e8ae1
Update internal cutlass commit. (#5228)
Tracin Jun 17, 2025
bb23483
test: add more pytorch cases in perf test (#5237)
ruodil Jun 17, 2025
546274d
fix ci (#5259)
QiJune Jun 17, 2025
a49ad79
test: [CI] remove closed bugs (#5218)
xinhe-nv Jun 17, 2025
4b82b8b
[TRTLLM-5330] perf: Optimize MoE supplementary kernels for large-scal…
syuoni Jun 17, 2025
134cb66
fix mla test (#5240)
qsang-nv Jun 17, 2025
6a6b9d2
doc: add document of benchmarking for Qwen3 (#5158)
byshiue Jun 17, 2025
faca19c
update setup.py for special cases (#5227)
qsang-nv Jun 17, 2025
517c1ec
move some test cases of TensorRT backend back (#5232)
QiJune Jun 17, 2025
498fadc
[feat] Add EAGLE3 support for Qwen3 (#5206)
nv-yilinf Jun 17, 2025
2ad8758
[TRTLLM-5786][https://nvbugspro.nvidia.com/bug/5310520][test] Add QA …
crazydemo Jun 17, 2025
ccd9adb
CI: move multi-gpu test cases of tensorrt backend to h200 (#5272)
QiJune Jun 17, 2025
dc3861b
refactor: Unify decoder test with e2e worklfow (#5239)
Funatiq Jun 17, 2025
13eef64
[feat] Piecewise cuda graph support for MLA (#4467)
liji-nv Jun 17, 2025
8451a87
chore: Mass integration of release/0.20 (#5082)
amirkl94 Jun 17, 2025
44fb3c1
[TRTLLM-5770] feat: Integrate TRT-LLM Gen FP8 block scale MoE with Py…
DomBrown Jun 17, 2025
f4cdbfc
None - Some clean-ups for the automation pipeline (#5245)
chzblych Jun 17, 2025
f899c4d
Re-implement LlmResponse in Python to reduce host overhead of pybind …
QiJune Jun 17, 2025
5236bb9
delete cubins (#5274)
qsang-nv Jun 17, 2025
dcf18c4
infra[TRTLLM-5635] remove package stage in CI build (#5075)
niukuo Jun 17, 2025
ff32caf
[Infra] - Update dependencies with NGC PyTorch 25.05 and TRT 10.11 (#…
EmmaQiaoCh Jun 17, 2025
9bf69c9
[chore] Remove BaseDraftTokenManager (#5251)
mikeiovine Jun 17, 2025
2df9f87
[infra] Report CI authorization errors to PR (#5175)
tburt-nv Jun 17, 2025
7d55c38
Revert "[infra] Report CI authorization errors to PR" (#5298)
tburt-nv Jun 17, 2025
627062c
refactor: Update decoder buffer and logits management (#4450)
Funatiq Jun 18, 2025
e1e5f72
fix: only set _mpi_session if world_size is > 1 (#5253)
achartier Jun 18, 2025
855036d
update LlmRequest.is_dummy property (#5283)
QiJune Jun 18, 2025
41cfcaa
test: update qa test list (#5305)
crazydemo Jun 18, 2025
3c0fecb
CI: extend model weights load time for dsv3 in stress test. (#5275)
dominicshanshan Jun 18, 2025
f501ce5
[fix][test] move deepseek single gpu tests to post merge (#5280)
omera-nv Jun 18, 2025
8f67e36
Waive L0 tests (#5308)
yiqingy0 Jun 18, 2025
e44f768
feat: Add no_kv_cache_reuse option and streaming support for trtllm s…
yizhang-nv Jun 18, 2025
724e495
chore: partition LLM class into TorchLLM and TrtLLM (#4900)
Superjomn Jun 18, 2025
908463a
[feat]: improve performance of XQA-MLA for sm120 (#5087)
lowsfer Jun 18, 2025
ee26965
doc:update contributing md for internal developers (#5250)
nv-guomingz Jun 18, 2025
3b5d916
test: cherry-pick deepseek rcca cases in main branch (#5307)
ruodil Jun 18, 2025
6711ad9
[TRTLLM-5589] feat: Minor optimizations for tunable FP8 batched GEMM …
hyukn Jun 18, 2025
9ea7bb6
CI: fix TensorRT H200 tests (#5301)
QiJune Jun 18, 2025
3a02489
[TRTLLM-5758] test: Add Bielik-11B-v2.2 Model Support (#5159)
Wanli-Jiang Jun 18, 2025
d76bda7
chore: Refine printed info of CHECK_TYPE. (#5295)
bobboli Jun 18, 2025
38547b9
refactor: Introduce ResourceManagerType enum for resource management …
Funatiq Jun 18, 2025
516bd4d
chore: bump version to 0.21.0rc3 (#5309)
ZhanruiSunCh Jun 18, 2025
f599ee6
test: correct unittest rerun behavior (#5273)
tongyuantongyu Jun 18, 2025
a3a4841
Fix rerun step (#5319)
yiqingy0 Jun 18, 2025
375dd0b
Waive L0 (#5311)
yizhang-nv Jun 18, 2025
610a49f
tests: add multi nodes tests (#5196)
xinhe-nv Jun 18, 2025
0623ffe
feat: Add LLGuidance Support for PyTorch Backend (#5214)
jellysnack Jun 18, 2025
b29ac5b
[Infra] Update 5080 and 5090 case condition due to the driver update …
EmmaQiaoCh Jun 18, 2025
00bdd39
chore: Update README.md to expose meet-up info (#5329)
juney-nvidia Jun 18, 2025
d13d2f4
Remove duplicated test cases (#5323)
HuiGao-NV Jun 18, 2025
857108a
Add disagg slurm scripts (#5243)
qiaoxj07 Jun 18, 2025
e5ee5c5
Unwaive disaggregated serving accuracy tests (#5095)
Tabrizian Jun 18, 2025
a1c5704
[feat] Multi-node CI testing support via Slurm (#4771)
yuanjingx87 Jun 18, 2025
a28a152
[fix][test] remove some cpp test cases from h100 (#5335)
omera-nv Jun 18, 2025
5010f87
[fix][test] remove duplicate test runs (#5241)
omera-nv Jun 18, 2025
d25f93c
chore: skip test_llm_gpt2_medium_fp8 for fp8_pc_pt + quant_lm_head (#…
achartier Jun 18, 2025
0b6d005
[fix][test] clear cuda cache before unittests automatically (#5121)
omera-nv Jun 18, 2025
3946e79
fix[nvbug5298640]: trtllm-llmapi-launch multiple LLM instances (#4727)
Superjomn Jun 18, 2025
1a7c6e7
ci: Split long running jobs into multiple jobs (#5268)
Funatiq Jun 18, 2025
2b23cd5
[feat] Fusion finalize and allreduce for qwenmoe model (#5223)
zongfeijing Jun 19, 2025
6a388b1
chore: remove torch_compile prefix for TorchCompileConfig field membe…
nv-guomingz Jun 19, 2025
6c3210a
[test] add nvfp4 DeepSeek-V3-Lite-mtp tests (#5125)
lfr-0531 Jun 19, 2025
da576bc
Waive L0 test (#5349)
yiqingy0 Jun 19, 2025
decfe2f
chore: bump version to 0.21.0 (#5325)
yiqingy0 Jun 19, 2025
e87cf62
tests: cherry-pick from main branch, add qwen3 test cases and amend t…
ruodil Jun 19, 2025
8686805
[Infra]cherry pick sanity check yml change for 5080 and 5090 from mai…
EmmaQiaoCh Jun 19, 2025
ebc6dbc
doc: cherry pick #5334 (#5368)
MartinMarciniszyn Jun 19, 2025
2d5e202
fix: Fix skip by mpi size fixture (#5355)
yizhang-nv Jun 21, 2025
2b56957
Fix: missing clientId when serialize and deserialize response (cherry…
kaiyux Jun 24, 2025
9e110b2
tests: fix typos in qa test (#5421)
crazydemo Jun 25, 2025
32f50de
nvbugs-5331031; nvbugs-5344203 - address intermittent issues with Mis…
brb-nv Jun 25, 2025
af58393
feat: TRTLLM-5941 Upgrade xgrammar to 0.1.18 (#5364)
Wanli-Jiang Jun 25, 2025
5e50fcc
test: set enable_attention_dp=True in default deepseek settings (#5461)
ruodil Jun 25, 2025
5cd87be
tests: Set kv cache free memory fraction in test case (#5462)
HuiGao-NV Jun 25, 2025
b6d23d5
[Infra] - Waive failed tests on release/0.21 (#5477)
EmmaQiaoCh Jun 25, 2025
fc64f13
Fix permission for local user issues in NGC docker container. (#5373)
MartinMarciniszyn Jun 25, 2025
87ead4e
[nvbug 5273941] fix: broken cyclic reference detect (#5417)
Superjomn Jun 25, 2025
c2799d0
[nvbug/5354825] Fix nougat test image url (#5496)
amukkara Jun 26, 2025
a811077
fix: fix regression in LOCAL_USER (#5517)
ixlmar Jun 26, 2025
30a2a8b
doc: Fix benchmark cmd in disagg scripts (#5516)
kaiyux Jun 26, 2025
312fd47
fix: constrain grepping in docker/Makefile (#5493)
ixlmar Jun 26, 2025
e2054bb
[Infra][release/0.21] - waive failed tests (#5537)
EmmaQiaoCh Jun 27, 2025
b78ad75
ci: unwaive llmapi launch test (#5281)
Superjomn Jun 27, 2025
abb7357
[TRTLLM-5989, TRTLLM-5991, TRTLLM-5993] doc: Update container instruc…
ixlmar Jun 27, 2025
4fc0666
[cherry-pick] [CI] Waive `test_fp8_block_scales_4gpus[ep4-mtp_nextn=0…
venkywonka Jun 27, 2025
647e070
[Infra][release/0.21]Update nccl to 2.27.5 (#5539)
EmmaQiaoCh Jun 29, 2025
d6c81ba
fix [nvbug5351244]: test_mpi_session submit sync/async (#5608)
Superjomn Jun 30, 2025
9fe1dd6
fix:https://nvbugs/5362398 (#5609)
nv-guomingz Jun 30, 2025
1824c44
[nvbug 5300551] test: increase block count in eviction test (#5465)
zhengd-nv Jul 1, 2025
aa0b927
test: add more tests for GB200 with 8 GPUs/2 nodes in L0 tests (#5397)
yizhang-nv Jul 1, 2025
682b164
doc: Fix outdated config in DeepSeek best perf practice doc (#5638)
kaiyux Jul 1, 2025
d5606b0
fix: [https://nvbugs/5355219] Fix bug of Qwen3 235B CI on dgx_gb200 (…
byshiue Jul 2, 2025
92d3a2d
[https://nvbugspro.nvidia.com/bug/5351333][fix] Update to chunking ca…
FrankD412 Jul 2, 2025
a3c0cf0
fix: Investigate Gemma3 1B decoder output discrepancy (#5564)
brb-nv Jul 3, 2025
2f9d061
[Infra] - Waive failed cases on release/0.21 (#5674)
EmmaQiaoCh Jul 3, 2025
14f938e
Doc: Update invalid hugging face URLs (#5683)
Linda-Stadter Jul 3, 2025
8a8d2e9
[NVBUG:5355009] Modify check for fuse_fp4_quant on SM120 (#5651)
farazkh80 Jul 3, 2025
2aacdba
[TRTLLM-6100] fix: Nvbug 5356427: autotuned TRTLLM Gen fp8 block scal…
DomBrown Jul 4, 2025
2b66fe8
[nvbug/5341178][fix] Fix OOM in Llama 4 accuracy test (#5735)
brb-nv Jul 4, 2025
53394e0
test: Move some of the test from post merge to pre-merge, update dgx …
yizhang-nv Jul 4, 2025
b0354ef
[5321981] fix: Fix the Llama3.1 405B hanging issue. (#5698)
hyukn Jul 4, 2025
3e44db1
[Infra][nvbugs/5370968] - Unwaive l0 test (#5750)
yiqingy0 Jul 4, 2025
5ac92bb
[nvbugs/5336321][fix] Enable attention dp = False test case, Fix TRTL…
yizhang-nv Jul 4, 2025
518915b
[nvbug/5337601][fix] Fix disagg + speculative decoding (#5558)
Tabrizian Jul 4, 2025
aa4d0f0
[Infra] - Always use x86 image for the Jenkins agent (#5756)
chzblych Jul 6, 2025
6103466
test: fix some test failure and add llama_nemotron models in perf san…
ruodil Jul 7, 2025
9106b5d
fix: Skip rope scaling for local layers in Gemma3 VLM (#5773)
brb-nv Jul 7, 2025
7524c77
[nvbug 5004744][fix] rewrite completion API to avoid repetitive token…
LinPoly Jul 7, 2025
3a58db8
fix _pad_attention_dp_dummy_request (#5583)
QiJune Jul 7, 2025
06f8327
Fix docker cache mount (#5763)
MartinMarciniszyn Jul 7, 2025
4fa9284
[nvbug/5302638][nvbugs/5310314] fix _handle_cancelled_requests (#5532)
QiJune Jul 7, 2025
d47ac4e
cherry pick #5416 (#5776)
QiJune Jul 7, 2025
0a0ac7b
[nvbug 5304752][fix] enhance _check_arguments to filter illegal reque…
LinPoly Jul 7, 2025
97f4c9e
[nvbug5266240] chore: unwaive test_llm_with_dummy_weights (#5744)
Superjomn Jul 7, 2025
5a50e2b
[https://nvbugspro.nvidia.com/bug/5355054] fallback to cubins for fp8…
PerkzZheng Jul 8, 2025
6062dc6
fix: [https://nvbugspro.nvidia.com/bug/5345215] Unwaive for bug 53452…
bobboli Jul 8, 2025
f8b4077
[nvbugs/5326453] Avoid nesting NCCL grouping in allgather OP (#5789)
QiJune Jul 8, 2025
6d7a2cb
fix: [https://nvbugs/5351130][https://nvbugs/5333654] Unwaive for bug…
bobboli Jul 8, 2025
39ad602
doc: Update gb200 doc (#5840)
yizhang-nv Jul 8, 2025
cbcc55e
test: remove duplicate cases in perf sanity test (#5870)
ruodil Jul 9, 2025
2e21e34
[nvbug 5327706][fix] fix mgmn postprocess error (#5835)
LinPoly Jul 9, 2025
fd94d3c
[nvbugs/5345391] fix: chunked prefill + overlap scheduling (#5761)
Funatiq Jul 9, 2025
ce048ec
cherry-pick: [fix: nvbugs/5355493] Correctly clamp max sequence len t…
netanel-haber Jul 9, 2025
d9e265d
[https://nvbugs/5355316] fix: update torch.compile option to fix trit…
dc3671 Jul 10, 2025
ff9aabb
test: Add Gemma3 unit tests to CI in release/0.21 (#5899)
brb-nv Jul 10, 2025
cd7aeec
tests: Fix lora perf test (#5875)
amirkl94 Jul 10, 2025
8b7422c
fix: [nvbugs/5351130] Adjust DSV3-Lite tests free_gpu_memory_fraction…
bobboli Jul 10, 2025
8429c8b
chore: Port leftover 0.20 (#5907)
amirkl94 Jul 10, 2025
bfa917f
fix [nvbug/5351244]: address remote mpi session submit (#5664)
Superjomn Jul 10, 2025
aeea5b3
fix: [5328141] increase tolerance for test_fp8_block_scale_gemm (#5849)
nekorobov Jul 10, 2025
e831673
fix: timeout and broken pipe in disagg and worker tests (#5827)
zhengd-nv Jul 11, 2025
4905cac
[nvbugs/5333742] fix MTP illegal memory access in cuda graph warmup (…
lfr-0531 Jul 12, 2025
bed78a2
fix: fix index out of bounds error in spec decoding (#5954)
lfr-0531 Jul 14, 2025
332a65b
[nvbugs/5368410][fix] Disable moe allreduce for multi node (#5918)
yizhang-nv Jul 14, 2025
2e7da20
[fix] Release slots with spec decode + disagg (#5975)
Tabrizian Jul 14, 2025
63f4a7a
[TRTLLM-6495] doc: add disclaimer for 3rd party software installation…
nv-guomingz Jul 15, 2025
69a15c8
[None] - Waive L0 tests (#6082)
yiqingy0 Jul 16, 2025
bce13bb
Cherry Pick: PR #6076 (#6088)
ZhanruiSunCh Jul 16, 2025
f6db521
add release notes for 0.21 release (#6049)
QiJune Jul 16, 2025
4d0bcbc
fix: Fix triton backend build [nvbug 5396469] (#6098)
pcastonguay Jul 16, 2025
eeca3ad
[None][infra] Cherry-pick #6128 and #6130 from main branch (#6151)
chzblych Jul 18, 2025
9323de6
[Doc][Qwen3] update qwen3 into support-matrix (#6161)
byshiue Jul 18, 2025
ab4e178
[fix]: Revert commit 388b491 (#6143)
LinPoly Jul 18, 2025

Sorry, this diff is taking too long to generate.

It may be too large to display on GitHub.