-
Notifications
You must be signed in to change notification settings - Fork 296
Optimize flash bert path for hpu device #509
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: kaixuanliu <[email protected]>
Signed-off-by: kaixuanliu <[email protected]>
Signed-off-by: kaixuanliu <[email protected]>
This PR optimized the performance of flashBert path for HPU device, with this optimization, the mean latency drops from 6.4 ms to 4.32 ms, which finally aligns with the perf of tei-gaudi. |
Signed-off-by: kaixuanliu <[email protected]>
Seems you changed modeling which cover other devices, do you validated GPU, CPU, XPU? what's the performance? |
For CPU and XPU device, I just passed 2 extra args to calc attention, and these 2 args are only used in hpu_attn calculation. The other changes is just replace torch.addmm to F.linear, which I suppose there should be no perf difference. I validated the output correctness of CPU. I will double check the perf of both CPU and XPU and output of XPU. |
Have double checked the output correctness of XPU devices and perf of both CPU/XPU, no change compared with original implementation. |
@@ -15,6 +15,7 @@ | |||
__all__ = ["Model"] | |||
|
|||
TRUST_REMOTE_CODE = os.getenv("TRUST_REMOTE_CODE", "false").lower() in ["true", "1"] | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove this new blank line as it's the only change in the file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, have fixed it.
Signed-off-by: kaixuanliu <[email protected]>
No description provided.