Skip to content

Conversation

bobboli
Copy link
Collaborator

@bobboli bobboli commented May 13, 2025

PR title

Please write the PR title by following template:

[JIRA ticket link/nvbug link/github issue link][fix/feat/doc/infra/...] <summary of this PR>

For example, assume I have a PR hope to support a new feature about cache manager of Jira TRTLLM-1000 ticket, it would be like

[TRTLLM-1000][feat] Support a new feature about cache manager

Description

XQA is not enabled when history_length < kMinHistoryTokensPerBlock.

Also print the reason if XQA is not used in the log to help debug.

Test Coverage

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--disable-fail-fast --skip-test --stage-list "A10-1, xxx" --gpu-type "A30, H100_PCIe" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-[Post-Merge]-1, xxx"]

Launch build/test pipelines. All previously running jobs will be killed.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests. Will also run L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-[Post-Merge]-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-[Post-Merge]-1, xxx".

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

@bobboli
Copy link
Collaborator Author

bobboli commented May 14, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #5192 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #5192 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #3789 completed with status: 'FAILURE'

@bobboli
Copy link
Collaborator Author

bobboli commented May 28, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #6759 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #6759 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #4927 completed with status: 'FAILURE'

@bobboli
Copy link
Collaborator Author

bobboli commented Jun 5, 2025

We are failing tests/unittest/llmapi/test_llm.py::test_generate_with_seed. That test does the following:

    for output in llm.generate(prompts, sampling_params):
        generated_texts.append(output.outputs[0].text)
    for output in llm.generate(prompts, sampling_params):
        generated_texts.append(output.outputs[0].text)

where each prompts contains 10 identical inputs, and the first 5 out of 10 share a fixed seed. Therefore, it is expected to have 11 unique responses in generated_texts.

However, the actual output looks like this:

generated_texts[0]: -It hosts international culture trade that celebrs France being number zero number best vacanza France has to. Which kind do of travel do Tourst believe is hittn Most fashion centery fthe U . a s trimek f SAN pins Hollland r n igenous wighti the new a gernial annd c m u sy weinest place m on The th e United Statc Nit an its I . R Y g p in h
generated_texts[1]: - Cap(4 deciminator of radius, divided one or times twice ratio value)/Capital The longitudeThe Capital Caps- Square or Lines based off - Erosiv the points A circle, rounds
By making adjustamto keep right an. Pup, you cannot calculate area centums ais in your study at every step on all three. Coff(half to sixth). Here , the slope, as also difference it one by itself makes third more positive: S
generated_texts[2]: - Cap(4 deciminator of radius, divided one or times twice ratio value)/Capital The longitudeThe Capital Caps- Square or Lines based off - Erosiv the points A circle, rounds
By making adjustamto keep right an. Pup, you cannot calculate area centums ais in your study at every step on all three. Coff(half to sixth). Here , the slope, as also difference it one by itself makes third more positive: S
generated_texts[3]: - Cap(4 deciminator of radius, divided one or times twice ratio value)/Capital The longitudeThe Capital Caps- Square or Lines based off - Erosiv the points A circle, rounds
By making adjustamto keep right an. Pup, you cannot calculate area centums ais in your study at every step on all three. Coff(half to sixth). Here , the slope, as also difference it one by itself makes third more positive: S
generated_texts[4]: - Cap(4 deciminator of radius, divided one or times twice ratio value)/Capital The longitudeThe Capital Caps- Square or Lines based off - Erosiv the points A circle, rounds
By making adjustamto keep right an. Pup, you cannot calculate area centums ais in your study at every step on all three. Coff(half to sixth). Here , the slope, as also difference it one by itself makes third more positive: S
generated_texts[5]: officially determined in part a procession held Sunday where kennith moss burrfell kurt, amer ounce sterenock anaculus mum scott brown how, who gave what every last bit hey aftonso pige fliped about what is alinea par at and yi orrings to who rents new edsel new yem alpine scuttet that fy bull by was liz loasik whens unheaps
generated_texts[6]: founded _ ,? | Did Henry never show understanding
generated_texts[7]: ? Duffeens. They seem poo the fru ine they bile... It&mmpostcensiary stif and caned every nowher wept for wonders when not wort ungrits noe &1B-Ran or so (Hanken no one shuns this b-deor aweinic hizmint ynga worm hevees' ance trivile thinn.) You lately
generated_texts[8]: - in both cities In addition They live And breabkak- they die For other days are called after days such names Not capital Days But capitls Just. "Doña Marina Hotel Is This Tales: To Crono On Dilbert Reciplot This Might Sound Bare Again From 0 Day Capital Rimr Do An Aunt For Wanbong Lamp It Is Saan Kow Lanka Forgiving Poled Iyawawawa By V
generated_texts[9]: officially changed on "From Nohel Ben Issseryeïld - King Aricandul" An alternate retreading between one and that an orac
Tribuno, n . [For no sooner did [J.] Barak discover], Latinize no shorter'ther heel [By that]. Cp Travert to .I in no T,r hrt HIre e l m d; r thirt[1i & thit r A .]
generated_texts[10]: -It hosts international culture trade that celebrs France being number zero number best vacanza France has to. Which kind do of travel do Tourst believe is hittn Most fashion centery fthe U . a s trimek f SAN pins Hollland r n igenous wighti the new a gernial annd c m u sy weinest place m on The th e United Statc Nit an its I . R Y g p in h
generated_texts[11]: - Cap(4 deciminator of radius, divided one or times twice ratio value)/Capital The longitudeThe Capital Caps- Square or Lines based off - Erosiv the points A circle, rounds
By making adjustamto keep right an. Pup, you cannot calculate area centums ais in your study at every step on all three. Coff(half to sixth). Here , the slope, as also difference it one by itself makes third more positive: S
generated_texts[12]: - Cap(4 deciminator of radius, divided one or times twice ratio value)/Capital The longitudeThe Capital Caps- Square or Lines based off - Erosiv the points A circle, rounds
By making adjustamto keep right an. Pup, you cannot calculate area centums ais in your study at every step on all three. Coff(half to sixth). Here , the slope, as also difference it one by itself makes third more positive: S
generated_texts[13]: - Cap(4 deciminator of radius, divided one or times twice ratio value)/Capital The longitudeThe Capital Caps- Square or Lines based off - Erosiv the points A circle, rounds
By making adjustamto keep right an. Pup, you cannot calculate area centums ais in your study at every step on all three. Coff(half to sixth). Here , the slope, as also difference it one by itself makes third more positive: S
generated_texts[14]: - Cap(4 deciminator of radius, divided one or times twice ratio value)/Capital The longitudeThe Capital Caps- Square or Lines based off - Erosiv the points A circle, rounds
By making adjustamto keep right an. Pup, you cannot calculate area centums ais in your study at every step on all three. Coff(half to sixth). Here , the slope, as also difference it one by itself makes third more positive: S
generated_texts[15]: Parallel with one in east which becomes Rhonda East? - the third eastern terminium Paris(from the River Rhénédroughnei etne)- east Eggw and Parlev.Areas...Culturalequivilientes en Chilea¿qn quintaniaco?QQ=5A¿LAs. A/ashe1 & I?2 (junipe)/ (bree);3n n √ hx8z^< iJ
generated_texts[16]: ', lineWithTab)] ] ), '''Les Inauri''ntables Soudais Cédillon’en dindan lés Præes'il fwip du Cap - Aux dean''doutors.'Il descd et es dansas in La Garantle du Cas-sape or et dans Patej - Andalsont atte cest à Antenou-te an tandons (vues l. Lantres
generated_texts[17]: An am lankhydron is commonly assumed to point the f ject d
0 N an ac, in S a th Mp g oodly shaping stly m , D stant W iper in this atrocuosi f b E st alien ed oedef and that thi JE M and y.I oncq II atten, no E d J LBH stet al at uelh nc thp'th
generated_texts[18]: is is It sounds similar enough too 'Is in to an island named (cats') because my lump says them by here!' To: is... France'ArenaeasApr., Osp.:Ba ' 4! My eyes as. L 8h Is with f...France bans, they have tried Is this also.. Lakes! to a little red... Bs2i Mp i.... And of Clyce that of! We Are The of
generated_texts[19]: Nice – https I bet he must stay well content the wool sales increase? ■I love I saw a lovak place but sad news? Yes but nice tours are included do take advantage – This man seems like shhh, someone at Pinter (who has also worked all tones he is smold with more recently but i don get better or an image from the airship that comes round and ¾ it out an item , bees which hays some interesting comments after?

where [0] is different from [1:4], [10] is different from [11:14], resulting in 12 unique responses rather than 11. The reason is for IFB, in each call of generate, the first generation iteration only contains one generation request, and mmha is used in this case. The other 4 batched generation requests could enable XQA on the other hand.

So, this is a tradeoff between OOTB performance and robustness of API. @ming-wei @lowsfer @Superjomn @syuoni

  • Do you think it is a good idea to switch attention kernel based on batch size? The user may experience varying outputs depending on batch size, even for a fixed SamplingParams.
  • If you think the above is okay, we should deprecate this test, are you okay with it?

@bobboli
Copy link
Collaborator Author

bobboli commented Jun 10, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #8321 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #8321 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #6026 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

@bobboli bobboli merged commit 1b79041 into NVIDIA:main Jun 11, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants