Skip to content

Conversation

kaiyux
Copy link
Member

@kaiyux kaiyux commented Jun 17, 2025

You can use a extra-llm-api-config.yml to enable the feature:

stream_interval: 4

Signed-off-by: Kaiyu Xie <[email protected]>
@kaiyux kaiyux requested review from a team as code owners June 17, 2025 12:09
@kaiyux kaiyux requested a review from Naveassaf June 17, 2025 12:09
Signed-off-by: Kaiyu Xie <[email protected]>
Copy link
Collaborator

@hypdeb hypdeb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a very small number of changes to enable that feature, which is great! Do you have some numbers on how this affects performance?

@kaiyux kaiyux requested review from syuoni and dongxuy04 June 17, 2025 13:49
@kaiyux
Copy link
Member Author

kaiyux commented Jun 17, 2025

/bot run

Copy link
Collaborator

@pcastonguay pcastonguay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add tests to verify this is working as expected?

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9213 [ run ] triggered by Bot

@kaiyux
Copy link
Member Author

kaiyux commented Jun 17, 2025

Can we add tests to verify this is working as expected?

@pcastonguay I added tests to tests/integration/defs/accuracy/test_llm_api_pytorch.py (90c12f4) to make sure that there are no tokens missed. However if you meant to verify that it indeed return responses every N iterations, I'm not what might be the best way to verify that for now and can take a closer look tomorrow.

@pcastonguay
Copy link
Collaborator

Can we add tests to verify this is working as expected?

@pcastonguay I added tests to tests/integration/defs/accuracy/test_llm_api_pytorch.py (90c12f4) to make sure that there are no tokens missed. However if you meant to verify that it indeed return responses every N iterations, I'm not what might be the best way to verify that for now and can take a closer look tomorrow.

Yes I meant verifying that you only get a response every N tokens.

@kaiyux
Copy link
Member Author

kaiyux commented Jun 18, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9259 [ run ] triggered by Bot

Signed-off-by: Kaiyu Xie <[email protected]>
@tensorrt-cicd
Copy link
Collaborator

PR_Github #9259 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #6793 completed with status: 'FAILURE'

@kaiyux
Copy link
Member Author

kaiyux commented Jun 18, 2025

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9297 [ run ] triggered by Bot

Copy link
Collaborator

@QiJune QiJune left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9421 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #6914 completed with status: 'FAILURE'

Signed-off-by: Kaiyu Xie <[email protected]>
@kaiyux kaiyux force-pushed the user/kaiyu/stream_interval branch from f80ccb6 to ad7ec5e Compare June 19, 2025 04:09
@kaiyux
Copy link
Member Author

kaiyux commented Jun 19, 2025

/bot run --disable-fail-fast

@kaiyux kaiyux enabled auto-merge (squash) June 19, 2025 04:10
@tensorrt-cicd
Copy link
Collaborator

PR_Github #9444 [ run ] triggered by Bot

@kaiyux
Copy link
Member Author

kaiyux commented Jun 19, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9450 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9444 [ run ] completed with state ABORTED

@kaiyux
Copy link
Member Author

kaiyux commented Jun 19, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9465 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9450 [ run ] completed with state ABORTED
/LLM/main/L0_MergeRequest_PR pipeline #6939 completed with status: 'FAILURE'

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9465 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #6949 completed with status: 'FAILURE'

@kaiyux
Copy link
Member Author

kaiyux commented Jun 19, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9482 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9482 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #6960 completed with status: 'SUCCESS'

@kaiyux kaiyux merged commit 7246fd7 into NVIDIA:main Jun 19, 2025
3 checks passed
@kaiyux kaiyux deleted the user/kaiyu/stream_interval branch July 3, 2025 00:56
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 9, 2025
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 10, 2025
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 10, 2025
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 10, 2025
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 10, 2025
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 11, 2025
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 11, 2025
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants