Optimize flash bert path for hpu device #509

kaixuanliu · 2025-03-07T06:40:20Z

No description provided.

Signed-off-by: kaixuanliu <[email protected]>

kaixuanliu · 2025-03-07T06:50:09Z

This PR optimized the performance of flashBert path for HPU device, with this optimization, the mean latency drops from 6.4 ms to 4.32 ms, which finally aligns with the perf of tei-gaudi.

kaixuanliu · 2025-03-07T06:51:31Z

@Narsil @regisss pls help review, thx!

Signed-off-by: kaixuanliu <[email protected]>

…mbeddings-inference into flash_bert_hpu

yao-matrix · 2025-03-10T02:19:01Z

Seems you changed modeling which cover other devices, do you validated GPU, CPU, XPU? what's the performance?

kaixuanliu · 2025-03-10T02:27:36Z

For CPU and XPU device, I just passed 2 extra args to calc attention, and these 2 args are only used in hpu_attn calculation. The other changes is just replace torch.addmm to F.linear, which I suppose there should be no perf difference. I validated the output correctness of CPU. I will double check the perf of both CPU and XPU and output of XPU.

kaixuanliu · 2025-03-10T08:42:07Z

Have double checked the output correctness of XPU devices and perf of both CPU/XPU, no change compared with original implementation.

regisss · 2025-03-10T17:43:07Z

backends/python/server/text_embeddings_server/models/__init__.py

@@ -15,6 +15,7 @@
 __all__ = ["Model"]

 TRUST_REMOTE_CODE = os.getenv("TRUST_REMOTE_CODE", "false").lower() in ["true", "1"]
+


Let's remove this new blank line as it's the only change in the file

Oops, have fixed it.

Signed-off-by: kaixuanliu <[email protected]>

kaixuanliu added 4 commits March 7, 2025 01:24

optimize flash bert for hpu device

67ea005

Signed-off-by: kaixuanliu <[email protected]>

nice code

d69268e

Signed-off-by: kaixuanliu <[email protected]>

small adjuest

80491dc

Signed-off-by: kaixuanliu <[email protected]>

Merge branch 'main' into flash_bert_hpu

3ae5d75

kaixuanliu added 2 commits March 7, 2025 09:54

small fix

fd4ab86

Signed-off-by: kaixuanliu <[email protected]>

Merge branch 'flash_bert_hpu' of https://github.com/kaixuanliu/text-e…

6a656e6

…mbeddings-inference into flash_bert_hpu

regisss reviewed Mar 10, 2025

View reviewed changes

small fix

7121111

Signed-off-by: kaixuanliu <[email protected]>

kaixuanliu mentioned this pull request Mar 11, 2025

upgrade to SynapseAI 1.20 version huggingface/tei-gaudi#41

Merged

regisss approved these changes Mar 11, 2025

View reviewed changes

Narsil approved these changes Mar 11, 2025

View reviewed changes

regisss merged commit 6e4133b into huggingface:main Mar 11, 2025
2 of 9 checks passed

kaixuanliu deleted the flash_bert_hpu branch June 5, 2025 01:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize flash bert path for hpu device #509

Optimize flash bert path for hpu device #509

Uh oh!

kaixuanliu commented Mar 7, 2025

Uh oh!

kaixuanliu commented Mar 7, 2025

Uh oh!

kaixuanliu commented Mar 7, 2025

Uh oh!

yao-matrix commented Mar 10, 2025

Uh oh!

kaixuanliu commented Mar 10, 2025 •

edited

Loading

Uh oh!

kaixuanliu commented Mar 10, 2025

Uh oh!

regisss Mar 10, 2025

Uh oh!

kaixuanliu Mar 11, 2025

Uh oh!

Uh oh!

Uh oh!

		@@ -15,6 +15,7 @@
		__all__ = ["Model"]

		TRUST_REMOTE_CODE = os.getenv("TRUST_REMOTE_CODE", "false").lower() in ["true", "1"]

Optimize flash bert path for hpu device #509

Optimize flash bert path for hpu device #509

Uh oh!

Conversation

kaixuanliu commented Mar 7, 2025

Uh oh!

kaixuanliu commented Mar 7, 2025

Uh oh!

kaixuanliu commented Mar 7, 2025

Uh oh!

yao-matrix commented Mar 10, 2025

Uh oh!

kaixuanliu commented Mar 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kaixuanliu commented Mar 10, 2025

Uh oh!

regisss Mar 10, 2025

Choose a reason for hiding this comment

Uh oh!

kaixuanliu Mar 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

kaixuanliu commented Mar 10, 2025 •

edited

Loading