-
Notifications
You must be signed in to change notification settings - Fork 1.8k
[https://nvbugs/5387375] fix(scaffolding): fix scaffolding aime test in test_e2e #6140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Zhenhuan Chen <[email protected]>
WalkthroughThe changes refactor how outputs are handled and accessed in the scaffolding framework. Output strings are replaced by result objects or their text fields, redundant classes and attributes are removed, and return types are updated for majority vote logic. Associated tests and examples are updated to reflect these structural changes. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Controller
participant Task
participant Result
User->>Controller: Initiate majority vote or best-of-N
Controller->>Task: Collect candidate outputs
Task->>Result: Generate outputs
Controller->>Result: Evaluate outputs (majority/best)
Controller->>Task: Assign selected result object
Task->>User: Return selected output text
Suggested reviewers
Poem
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
/bot run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (1)
tensorrt_llm/scaffolding/math_utils.py (1)
61-76
: Verify the correctness of the manual counting implementation and majority index calculation.The refactoring replaces
collections.Counter
with manual counting, which is functionally correct. However, the majority index calculation usingfilter
andenumerate
could be simplified and made more readable.Consider this cleaner implementation:
- answer_counts = {} - for answer in valid_answers: - answer_counts[answer] = answer_counts.get(answer, 0) + 1 - majority_answer = max(answer_counts, key=answer_counts.get) - majority_index = next( - filter(lambda x: x[1] == majority_answer, - enumerate(extract_answers)))[0] + answer_counts = {} + for answer in valid_answers: + answer_counts[answer] = answer_counts.get(answer, 0) + 1 + majority_answer = max(answer_counts, key=answer_counts.get) + majority_index = extract_answers.index(majority_answer)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
examples/scaffolding/run_best_of_n_with_reward.py
(1 hunks)examples/scaffolding/run_majority_vote_aime24.py
(1 hunks)tensorrt_llm/scaffolding/__init__.py
(0 hunks)tensorrt_llm/scaffolding/controller.py
(3 hunks)tensorrt_llm/scaffolding/math_utils.py
(1 hunks)tensorrt_llm/scaffolding/result.py
(1 hunks)tensorrt_llm/scaffolding/scaffolding_llm.py
(1 hunks)tensorrt_llm/scaffolding/task.py
(2 hunks)tests/integration/test_lists/waives.txt
(0 hunks)tests/unittest/scaffolding/test_bench.py
(3 hunks)tests/unittest/scaffolding/test_parallel_process.py
(0 hunks)tests/unittest/scaffolding/test_task_collection.py
(0 hunks)
💤 Files with no reviewable changes (4)
- tensorrt_llm/scaffolding/init.py
- tests/unittest/scaffolding/test_parallel_process.py
- tests/integration/test_lists/waives.txt
- tests/unittest/scaffolding/test_task_collection.py
🧰 Additional context used
🧬 Code Graph Analysis (5)
examples/scaffolding/run_best_of_n_with_reward.py (3)
tensorrt_llm/scaffolding/task.py (1)
outputs
(77-78)tensorrt_llm/scaffolding/result.py (1)
outputs
(29-30)tests/integration/defs/accuracy/test_disaggregated_serving.py (1)
outputs
(37-38)
examples/scaffolding/run_majority_vote_aime24.py (2)
tensorrt_llm/scaffolding/task.py (3)
result
(69-70)result
(73-74)outputs
(77-78)tensorrt_llm/scaffolding/math_utils.py (1)
extract_answer_from_boxed
(14-52)
tensorrt_llm/scaffolding/result.py (1)
tensorrt_llm/executor/result.py (1)
GenerationResult
(398-547)
tests/unittest/scaffolding/test_bench.py (1)
tensorrt_llm/scaffolding/task.py (2)
result
(69-70)result
(73-74)
tensorrt_llm/scaffolding/controller.py (2)
tensorrt_llm/scaffolding/task.py (2)
result
(69-70)result
(73-74)tensorrt_llm/scaffolding/math_utils.py (1)
get_digit_majority_vote_result
(78-87)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (21)
tensorrt_llm/scaffolding/math_utils.py (1)
82-88
: Approve the simplification of digit validation logic.The removal of the boxed answer extraction step before digit validation is a good simplification. The function now correctly validates digits after extraction and returns the appropriate tuple format.
examples/scaffolding/run_best_of_n_with_reward.py (1)
63-63
: Output access pattern correctConfirmed that the first element of the
outputs
list is always the text output (viaInferRequestedOutput("text_output")
), soresults[0].outputs[0].text
correctly retrieves the generated text. No further changes needed.tensorrt_llm/scaffolding/scaffolding_llm.py (1)
85-85
: Good defensive programming approach.Using
getattr(task, 'streaming', False)
instead of direct attribute access prevents potentialAttributeError
exceptions if thestreaming
attribute doesn't exist on task objects. This is a safe and appropriate change.examples/scaffolding/run_majority_vote_aime24.py (1)
104-105
: Verified removal ofresult.result()
is safe and new outputs access is correct
–ScaffoldingLlm.generate()
returns fully resolved results (internally awaits/collects eachScaffoldingResult
), so an explicitresult.result()
is redundant.
–ScaffoldingResult.outputs
is a@property
that returnscur_output.outputs
, andoutputs[0].text
correctly accesses the first completion’s text.
No changes needed.tensorrt_llm/scaffolding/result.py (1)
12-12
: Approve the explicit typing of cur_output attribute.The explicit typing of
cur_output
asGenerationResult
improves type safety and is consistent with the refactoring to remove the redundantScaffoldingOutput
class.tests/unittest/scaffolding/test_bench.py (1)
59-59
: Verify the cur_output property access.The change from
results[0].output.output_str
toresults[0].cur_output
aligns with the refactoring away from theScaffoldingOutput
class. This looks correct assumingcur_output
properly exposes the generated text.Please verify that the
cur_output
property is correctly implemented in the result object and returns the expected string value.tensorrt_llm/scaffolding/task.py (9)
72-75
: Good simplification of the result setter.Removing the redundant
_outputs
caching and keeping the setter focused on its primary responsibility improves code clarity and maintainability.
76-78
: Good addition of the outputs property.The new
outputs
property provides clean access to the underlying GenerationResult outputs with appropriate null checking.
81-82
: Correct update to output_tokens property.The property now correctly accesses token_ids from the GenerationResult with proper null checking.
85-86
: Correct update to output_str property.The property now correctly accesses the text from the GenerationResult outputs with proper null checking.
88-91
: Correct update to output_str setter.The setter now correctly modifies the text field in the GenerationResult with appropriate assertion checking.
94-96
: Correct update to cumulative_logprob property.The property now correctly accesses cumulative_logprob from the GenerationResult outputs with proper null checking.
99-100
: Correct update to logprobs property.The property now correctly accesses logprobs from the GenerationResult outputs with proper null checking.
103-104
: Correct update to context_logits property.The property now correctly accesses context_logits from the GenerationResult with proper null checking.
114-115
: Correct update to create_scaffolding_output method.The method now correctly returns the GenerationResult directly, which aligns with the refactoring away from the ScaffoldingOutput class.
tensorrt_llm/scaffolding/controller.py (6)
4-4
: Correct addition of Tuple import.The Tuple import is correctly added to support the new return type annotation for the majority_vote method.
234-235
: Correct update to majority_vote call.The tuple unpacking correctly handles the updated return type from the majority_vote method, providing access to both the index and the answer.
237-237
: Correct update to assertion.The assertion now correctly checks the type of the unpacked majority_answer variable, ensuring type safety.
239-239
: Excellent improvement to result assignment.Assigning the complete
result
from the majority choice task preserves all output information (tokens, logits, etc.) rather than just the string. This is a more comprehensive and correct approach than the previous string-only assignment.
241-241
: Correct update to return type annotation.The return type annotation now correctly reflects that the method returns both the index and the answer as a tuple.
296-296
: Consistent improvement to result assignment.The change to assign the complete
result
from the best task is consistent with the MajorityVoteController refactoring and preserves all output information rather than just the string.
PR_Github #12195 [ run ] triggered by Bot |
PR_Github #12195 [ run ] completed with state |
…in test_e2e (NVIDIA#6140) Signed-off-by: Zhenhuan Chen <[email protected]>
…in test_e2e (NVIDIA#6140) Signed-off-by: Zhenhuan Chen <[email protected]>
…in test_e2e (NVIDIA#6140) Signed-off-by: Zhenhuan Chen <[email protected]> Signed-off-by: Shreyas Misra <[email protected]>
Summary by CodeRabbit
Bug Fixes
Refactor
Tests
Chores
Description
Test Coverage
GitHub Bot Help
/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...
Provide a user friendly way for developers to interact with a Jenkins server.
Run
/bot [-h|--help]
to print this help message.See details below for each supported subcommand.
run [--disable-fail-fast --skip-test --stage-list "A10-1, xxx" --gpu-type "A30, H100_PCIe" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-[Post-Merge]-1, xxx"]
Launch build/test pipelines. All previously running jobs will be killed.
--disable-fail-fast
(OPTIONAL) : Disable fail fast on build/tests/infra failures.--skip-test
(OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.--stage-list "A10-1, xxx"
(OPTIONAL) : Only run the specified test stages. Examples: "A10-1, xxx". Note: Does NOT update GitHub check status.--gpu-type "A30, H100_PCIe"
(OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.--only-multi-gpu-test
(OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.--disable-multi-gpu-test
(OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.--add-multi-gpu-test
(OPTIONAL) : Force run the multi-GPU tests. Will also run L0 pre-merge pipeline.--post-merge
(OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.--extra-stage "H100_PCIe-[Post-Merge]-1, xxx"
(OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-[Post-Merge]-1, xxx".For guidance on mapping tests to stage names, see
docs/source/reference/ci-overview.md
.kill
kill
Kill all running builds associated with pull request.
skip
skip --comment COMMENT
Skip testing for latest commit on pull request.
--comment "Reason for skipping build/test"
is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.reuse-pipeline
reuse-pipeline
Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.