Skip to content

Conversation

jaedeok-nvidia
Copy link
Collaborator

PyExecutor._enqueue_responses now accepts a list of LlmResponse instead of a dictionary mapping req_id to response. Since each response already contains its request_id, using req_id as a dictionary key is redundant and causes issues when num_return_sequences > 1, where multiple responses share the same request_id.

This change aligns PyExecutor with C++ Executor's behavior, which returns std::vector. The previous dictionary-based approach would overwrite responses with the same request_id, losing all but the last response. With this fix, all responses for multi-sequence generation are properly preserved.

Changes:

  • _enqueue_responses now accepts List[LlmResponse] instead of Dict[int, LlmResponse]
  • All callers updated: _handle_errors, _handle_cancelled_requests, _handle_first_token_response, _handle_responses

PyExecutor._enqueue_responses now accepts a list of LlmResponse instead of a dictionary mapping req_id to response. Since each response already contains its request_id, using req_id as a dictionary key is redundant and causes issues when num_return_sequences > 1, where multiple responses share the same request_id.

This change aligns PyExecutor with C++ Executor's behavior, which returns std::vector<Response>. The previous dictionary-based approach would overwrite responses with the same request_id, losing all but the last response. With this fix, all responses for multi-sequence generation are properly preserved.

Changes:
- _enqueue_responses now accepts List[LlmResponse] instead of Dict[int, LlmResponse]
- All callers updated: _handle_errors, _handle_cancelled_requests, _handle_first_token_response, _handle_responses

Signed-off-by: Jaedeok Kim <[email protected]>
@jaedeok-nvidia jaedeok-nvidia requested a review from a team as a code owner June 23, 2025 17:07
@jaedeok-nvidia jaedeok-nvidia self-assigned this Jun 23, 2025
@jaedeok-nvidia
Copy link
Collaborator Author

This is a draft PR for a discussion purpose. Please DO NOT MERGE yet.

@jaedeok-nvidia jaedeok-nvidia changed the title refactor: PyExecutor uses a list-type for response handling [DRAFT] refactor: PyExecutor uses a list-type for response handling Jun 23, 2025
@jaedeok-nvidia
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9611 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9611 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #7064 completed with status: 'FAILURE'

@jaedeok-nvidia
Copy link
Collaborator Author

jaedeok-nvidia commented Jun 25, 2025

Closing the PR because it's no longer needed for #5415. If needed, we will revisit later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants