-
Notifications
You must be signed in to change notification settings - Fork 1.8k
[test] Use LLM API for Nemotron-H correctness test #5097
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[test] Use LLM API for Nemotron-H correctness test #5097
Conversation
… fails Signed-off-by: Tomer Asida <[email protected]>
Signed-off-by: Tomer Asida <[email protected]>
…failed or succeeded (2) don't add BOS token to match expected outputs Signed-off-by: Tomer Asida <[email protected]>
…y state_indices during forward pass. Now LLM API test passes Signed-off-by: Tomer Asida <[email protected]>
Signed-off-by: Tomer Asida <[email protected]>
…-LLM into fix-nemotron-h-warmup Signed-off-by: Tomer Asida <[email protected]>
Signed-off-by: Tomer Asida <[email protected]>
Signed-off-by: Tomer Asida <[email protected]>
Signed-off-by: Tomer Asida <[email protected]>
Signed-off-by: Tomer Asida <[email protected]>
Signed-off-by: Tomer Asida <[email protected]>
Signed-off-by: Tomer Asida <[email protected]>
Signed-off-by: Tomer Asida <[email protected]>
…eference logprobs from commit 5ce1102 after fix Signed-off-by: Tomer Asida <[email protected]>
Signed-off-by: Tomer Asida <[email protected]>
…es llm api Signed-off-by: Tomer Asida <[email protected]>
Signed-off-by: Tomer Asida <[email protected]>
Signed-off-by: Tomer Asida <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot reviewed 1 out of 1 changed files in this pull request and generated no comments.
/bot run |
PR_Github #8337 [ run ] triggered by Bot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR_Github #8337 [ run ] completed with state |
…, and other tests don't necessarily clean memory after themselves Signed-off-by: Tomer Asida <[email protected]>
/bot run |
PR_Github #8358 [ run ] triggered by Bot |
PR_Github #8358 [ run ] completed with state |
/bot run |
PR_Github #8419 [ run ] triggered by Bot |
Signed-off-by: Tomer Asida <[email protected]>
/bot run |
PR_Github #8440 [ run ] triggered by Bot |
PR_Github #8419 [ run ] completed with state |
Signed-off-by: Tomer Asida <[email protected]>
PR_Github #8440 [ run ] completed with state |
/bot run |
PR_Github #8493 [ run ] triggered by Bot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use of get context & generation logits LGTM.
Note that logprobs support was merged this morning: #4836 - using it can probably save some more code from this test
PR_Github #8493 [ run ] completed with state |
/bot run |
PR_Github #8530 [ run ] triggered by Bot |
PR_Github #8530 [ run ] completed with state |
/bot run |
PR_Github #8535 [ run ] triggered by Bot |
PR_Github #8535 [ run ] completed with state |
Nemotron-H correctness tests doesn't just check that the completions are as expected, but rather also validates that logprobs aren't too far from reference logprobs. Initially, the test was implemented using a manual generation for-loop, since the LLM API didn't support returning context and generation logprobs.
After PRs #4538 and #4819 added context and generation logits support for the LLM API, it can now be used to compare logprobs as well, removing the need for a manual generation loop. It is also better to use the LLM API in the unittest since that's the standard way to use a model in TRTLLM.
Therefore, this PR merges the old Nemotron-H correctness and LLM API unittests, such that now the LLM API is used in the Nemotron-H correctness test.
Also, the reference generation logprobs were updated, since turns out there was a bug in the position_ids in the old manual generation loop. This further emphasizes why, when possible, standard APIs should be used instead of custom code.