[DRAFT] refactor: PyExecutor uses a list-type for response handling #5406
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PyExecutor._enqueue_responses now accepts a list of LlmResponse instead of a dictionary mapping req_id to response. Since each response already contains its request_id, using req_id as a dictionary key is redundant and causes issues when num_return_sequences > 1, where multiple responses share the same request_id.
This change aligns PyExecutor with C++ Executor's behavior, which returns std::vector. The previous dictionary-based approach would overwrite responses with the same request_id, losing all but the last response. With this fix, all responses for multi-sequence generation are properly preserved.
Changes: