-
Notifications
You must be signed in to change notification settings - Fork 637
Optimizing the performance of think length limit using custom operators #4279
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Optimizing the performance of think length limit using custom operators #4279
Conversation
Thanks for your contribution! |
3bad98a
to
73384a6
Compare
62dd5da
to
6f1f082
Compare
…ehome/FastDeploy into upgrade_limit_think_length
…into upgrade_limit_think_length
…into upgrade_limit_think_length
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
得check一下PD分离下,max_think_len=1的极端情况
可以参考这个PR: #4433
这个PR的一些改动修复了极端情况的Bug.
__global__ void limit_thinking_content_length_kernel_v2( | ||
int64_t *next_tokens, | ||
const int *max_think_lens, | ||
const int64_t *step_idx, // step_idx 不再需要被修改,改为 const |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个注释删了
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…into upgrade_limit_think_length
Changed:
</think> for ernie4_5_vl, \n</think>\n\n for ernie_x1
,分别实现了两个自定义算子Comming soon: