[test] Use LLM API for Nemotron-H correctness test #5097

tomeras91 · 2025-06-10T16:57:31Z

Nemotron-H correctness tests doesn't just check that the completions are as expected, but rather also validates that logprobs aren't too far from reference logprobs. Initially, the test was implemented using a manual generation for-loop, since the LLM API didn't support returning context and generation logprobs.

After PRs #4538 and #4819 added context and generation logits support for the LLM API, it can now be used to compare logprobs as well, removing the need for a manual generation loop. It is also better to use the LLM API in the unittest since that's the standard way to use a model in TRTLLM.

Therefore, this PR merges the old Nemotron-H correctness and LLM API unittests, such that now the LLM API is used in the Nemotron-H correctness test.

Also, the reference generation logprobs were updated, since turns out there was a bug in the position_ids in the old manual generation loop. This further emphasizes why, when possible, standard APIs should be used instead of custom code.

… fails Signed-off-by: Tomer Asida <[email protected]>

Signed-off-by: Tomer Asida <[email protected]>

…failed or succeeded (2) don't add BOS token to match expected outputs Signed-off-by: Tomer Asida <[email protected]>

…y state_indices during forward pass. Now LLM API test passes Signed-off-by: Tomer Asida <[email protected]>

Signed-off-by: Tomer Asida <[email protected]>

…-LLM into fix-nemotron-h-warmup Signed-off-by: Tomer Asida <[email protected]>

Signed-off-by: Tomer Asida <[email protected]>

…eference logprobs from commit 5ce1102 after fix Signed-off-by: Tomer Asida <[email protected]>

Signed-off-by: Tomer Asida <[email protected]>

…es llm api Signed-off-by: Tomer Asida <[email protected]>

Signed-off-by: Tomer Asida <[email protected]>

Copilot

Copilot reviewed 1 out of 1 changed files in this pull request and generated no comments.

tomeras91 · 2025-06-10T16:58:07Z

FYI @vegaluisjose @suyoggupta

tomeras91 · 2025-06-10T17:00:59Z

/bot run

tensorrt-cicd · 2025-06-10T17:06:55Z

PR_Github #8337 [ run ] triggered by Bot

vegaluisjose

LGTM

omera-nv

LGTM

tensorrt-cicd · 2025-06-10T22:07:36Z

PR_Github #8337 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #6039 completed with status: 'FAILURE'

…, and other tests don't necessarily clean memory after themselves Signed-off-by: Tomer Asida <[email protected]>

tomeras91 · 2025-06-10T23:28:45Z

/bot run

tensorrt-cicd · 2025-06-10T23:34:14Z

PR_Github #8358 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-11T00:37:45Z

PR_Github #8358 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #6052 completed with status: 'FAILURE'

tomeras91 · 2025-06-11T07:07:20Z

/bot run

tensorrt-cicd · 2025-06-11T07:25:29Z

PR_Github #8419 [ run ] triggered by Bot

Signed-off-by: Tomer Asida <[email protected]>

tomeras91 · 2025-06-11T07:49:54Z

/bot run

tensorrt-cicd · 2025-06-11T07:58:05Z

PR_Github #8440 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-11T07:58:07Z

PR_Github #8419 [ run ] completed with state ABORTED

Signed-off-by: Tomer Asida <[email protected]>

tensorrt-cicd · 2025-06-11T13:24:58Z

PR_Github #8440 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #6119 completed with status: 'FAILURE'

tomeras91 · 2025-06-11T13:25:43Z

/bot run

tensorrt-cicd · 2025-06-11T13:31:32Z

PR_Github #8493 [ run ] triggered by Bot

amitz-nv

Use of get context & generation logits LGTM.
Note that logprobs support was merged this morning: #4836 - using it can probably save some more code from this test

tensorrt-cicd · 2025-06-11T18:58:17Z

PR_Github #8493 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #6155 completed with status: 'FAILURE'

tomeras91 · 2025-06-11T19:13:51Z

/bot run

tensorrt-cicd · 2025-06-11T19:20:07Z

PR_Github #8530 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-11T19:20:57Z

PR_Github #8530 [ run ] completed with state FAILURE

tomeras91 · 2025-06-11T19:32:50Z

/bot run

tensorrt-cicd · 2025-06-11T19:38:47Z

PR_Github #8535 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-12T03:57:47Z

PR_Github #8535 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #6188 completed with status: 'SUCCESS'

tomeras91 added 19 commits June 5, 2025 11:13

Add unittest for Nemotron-H using the pytroch LLM API, that currently…

c70cd1a

… fails Signed-off-by: Tomer Asida <[email protected]>

use pytest instead of unittest in Nemotron-H correctness test

1e5b107

Signed-off-by: Tomer Asida <[email protected]>

Fix Nemotron-H LLM API test - (1) call shutdown() regardless if test …

b650a85

…failed or succeeded (2) don't add BOS token to match expected outputs Signed-off-by: Tomer Asida <[email protected]>

Deal with warmup requests better in Mamba2Mixer forward - assign dumm…

966f3a7

…y state_indices during forward pass. Now LLM API test passes Signed-off-by: Tomer Asida <[email protected]>

Merge branch 'NVIDIA:main' into fix-nemotron-h-warmup

02d13c7

Merge branch 'main' into fix-nemotron-h-warmup

61c3db8

Signed-off-by: Tomer Asida <[email protected]>

Merge branch 'fix-nemotron-h-warmup' of github.com:tomeras91/TensorRT…

cb80c01

…-LLM into fix-nemotron-h-warmup Signed-off-by: Tomer Asida <[email protected]>

Merge branch 'main' into fix-nemotron-h-warmup

8d3d4a9

Signed-off-by: Tomer Asida <[email protected]>

Merge branch 'main' into fix-nemotron-h-warmup

a19bf8c

Signed-off-by: Tomer Asida <[email protected]>

Merge branch 'main' into fix-nemotron-h-warmup

21e78c7

Signed-off-by: Tomer Asida <[email protected]>

clear memory between tests to avoid OOM on A30

02f0817

Signed-off-by: Tomer Asida <[email protected]>

Merge branch 'main' into fix-nemotron-h-warmup

b3bc5e1

Signed-off-by: Tomer Asida <[email protected]>

Merge branch 'main' into llm-api-for-nemotron-h-correctness-test

2ffed74

Signed-off-by: Tomer Asida <[email protected]>

WIP: Add correctness test using the LLM API

64b1a6d

Signed-off-by: Tomer Asida <[email protected]>

fix: position_ids was off by one during manual decode. Fix + update r…

7023357

…eference logprobs from commit 5ce1102 after fix Signed-off-by: Tomer Asida <[email protected]>

update reference logprobs in llm api test as well. Now it passes

61c919c

Signed-off-by: Tomer Asida <[email protected]>

remove old correctness test and llm api test. Now correctness test us…

c06a384

…es llm api Signed-off-by: Tomer Asida <[email protected]>

remove debug prints

54c3bd8

Signed-off-by: Tomer Asida <[email protected]>

Merge branch 'main' into llm-api-for-nemotron-h-correctness-test

8280bb8

Signed-off-by: Tomer Asida <[email protected]>

tomeras91 requested a review from Copilot June 10, 2025 16:57

Copilot AI reviewed Jun 10, 2025

View reviewed changes

tomeras91 requested review from amitz-nv and omera-nv June 10, 2025 16:57

tomeras91 changed the title ~~[feat] Use LLM API for Nemotron-h correctness test~~ [test] Use LLM API for Nemotron-h correctness test Jun 10, 2025

tomeras91 changed the title ~~[test] Use LLM API for Nemotron-h correctness test~~ [test] Use LLM API for Nemotron-H correctness test Jun 10, 2025

vegaluisjose approved these changes Jun 10, 2025

View reviewed changes

omera-nv approved these changes Jun 10, 2025

View reviewed changes

clear torch cuda cache before test since it's memory intensive on A30…

2b92abb

…, and other tests don't necessarily clean memory after themselves Signed-off-by: Tomer Asida <[email protected]>

Merge branch 'main' into llm-api-for-nemotron-h-correctness-test

924251a

Signed-off-by: Tomer Asida <[email protected]>

Merge branch 'main' into llm-api-for-nemotron-h-correctness-test

2e4f1b3

Signed-off-by: Tomer Asida <[email protected]>

amitz-nv approved these changes Jun 11, 2025

View reviewed changes

shaharmor98 merged commit 06d9f1e into NVIDIA:main Jun 12, 2025
3 checks passed

tomeras91 deleted the llm-api-for-nemotron-h-correctness-test branch June 12, 2025 07:47

[test] Use LLM API for Nemotron-H correctness test #5097

[test] Use LLM API for Nemotron-H correctness test #5097

Uh oh!

Conversation

tomeras91 commented Jun 10, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

tomeras91 commented Jun 10, 2025

Uh oh!

tomeras91 commented Jun 10, 2025

Uh oh!

tensorrt-cicd commented Jun 10, 2025

Uh oh!

vegaluisjose left a comment

Choose a reason for hiding this comment

Uh oh!

omera-nv left a comment

Choose a reason for hiding this comment

Uh oh!

tensorrt-cicd commented Jun 10, 2025

Uh oh!

tomeras91 commented Jun 10, 2025

Uh oh!

tensorrt-cicd commented Jun 10, 2025

Uh oh!

tensorrt-cicd commented Jun 11, 2025

Uh oh!

tomeras91 commented Jun 11, 2025

Uh oh!

tensorrt-cicd commented Jun 11, 2025

Uh oh!

tomeras91 commented Jun 11, 2025

Uh oh!

tensorrt-cicd commented Jun 11, 2025

Uh oh!

tensorrt-cicd commented Jun 11, 2025

Uh oh!

tensorrt-cicd commented Jun 11, 2025

Uh oh!

tomeras91 commented Jun 11, 2025

Uh oh!

tensorrt-cicd commented Jun 11, 2025

Uh oh!

amitz-nv left a comment

Choose a reason for hiding this comment

Uh oh!

tensorrt-cicd commented Jun 11, 2025

Uh oh!

tomeras91 commented Jun 11, 2025

Uh oh!

tensorrt-cicd commented Jun 11, 2025

Uh oh!

tensorrt-cicd commented Jun 11, 2025

Uh oh!

tomeras91 commented Jun 11, 2025

Uh oh!

tensorrt-cicd commented Jun 11, 2025

Uh oh!

tensorrt-cicd commented Jun 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants